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Preface 



SAC 2001, the eighth annual workshop on selected areas in cryptography, was 
held at the Fields Institute in Toronto, Ontario, Canada. Previous SAC work- 
shops were held at Queen’s University in Kingston (1994, 1996, 1998, and 1999), 
at Carlton University in Ottawa (1995 and 1997) and at the University of Water- 
loo (2000). The conference was sponsored by the center for applied cryptographic 
research (CACR) at the University of Waterloo, Certicom Corporation, Com- 
munications and Information Technology Ontario (CITO), Ecole Polytechnique 
Federate de Lausanne, Entrust Technologies, and ZeroKnowledge. We are grate- 
ful to these organizations for their support of the conference. 

The current SAC board includes Carlisle Adams, Doug Stinson, Ed Dawson, 
Henk Meijer, Howard Heys, Michael Wiener, Serge Vaudenay, Stafford Tavares, 
and Tom Cusick. We would like to thank all of them for giving us the mandate 
to organize SAC 2001. 

The themes for SAC 2001 workshop were: 

— Design and analysis of symmetric key cryptosystems. 

— Primitives for private key cryptography, including block and stream ciphers, 
hash functions, and MACs. 

— Efficient implementations of cryptographic systems in public and private key 
cryptography. 

— Cryptographic solutions for web and internet security. 

There were 57 technical papers submitted to the conference from an inter- 
national authorship. Every paper was refereed by at least 3 reviewers and 25 
papers were accepted for presentation at the conference. We would like to thank 
the authors of all the submitted papers, both those whose work is included in 
these proceedings, and those whose work could not be accommodated. 

In addition to these 25 papers, two invited presentations were given at the 
conference: one by Moti Yung from CertCo, USA, entitled “Polynomial Recon- 
struction Based Cryptography ” and the other by Phong Nguyen from the Ecole 
Normale Superieure, France, entitled “The two faces of lattices in cryptology” . 
Thanks to both Moti and Phong for their excellent talks and for kindly accepting 
our invitation. 

The program committee for SAC 2001 consisted of the following members: 
Stefan Brands, Matt Franklin, Henri Gilbert, Howard Heys, Hideki Imai, Shiho 
Moriai, Kaisa Nyberg, Rich Schroeppel, Doug Stinson, Stafford Tavares, Serge 
Vaudenay, Michael Wiener, Amr Youssef , and Yuliang Zheng. 

On behalf of the program committee we would like to thank the following 
sub-referees for their help in the reviewing process: Joonsang Baek, Guang Gong, 
Ian Goldberg, Darrel Hankerson, Keiichi Iwamura, Mike Just, Masayuki Kanda, 
Liam Keliher, Mira Kim, Kazukuni Kobara, Frederic Legare, Henk Meijer, Al- 
fred John Menezes, Miodrag Mihaljevic, Ulf Moller, Dalit Naor, Daisuke No- 
jiri, Mohammad Ghulam Rahman, Palash Sarkar, Akashi Satoh, Junji Shikata, 
Takeshi Shimoyama, Ron Steinfeld, Anton Stiglic, Edlyn Teske, Yodai Watan- 
abe, Huapeng Wu, Daichi Yamane, and Robert Zuccherato. 
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We would like to thank all the people involved in organizing the conference. 
In particular we would like to thank Pascal Junod for his effort in making the 
reviewing process run smoothly. Special thanks are due to Frances Hannigan 
for her help in the local arrangements and for making sure that everything ran 
smoothly during the workshop. Finally we would like to thank all the partici- 
pants of SAC 2001. 

August 2001 Serge Vaudenay and Amr Youssef 
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Abstract. In this paper we present several weaknesses in the key schedul- 
ing algorithm of RC4, and describe their cryptanalytic significance. We 
identify a large number of weak keys, in which knowledge of a small 
number of key bits suffices to determine many state and output bits 
with non-negligible probability. We use these weak keys to construct 
new distinguishers for RC4, and to mount related key attacks with prac- 
tical complexities. Finally, we show that RC4 is completely insecure in a 
common mode of operation which is used in the widely deployed Wired 
Equivalent Privacy protocol (WEP, which is part of the 802.11 standard), 
in which a fixed secret key is concatenated with known IV modifiers in 
order to encrypt different messages. Our new passive ciphertext-only at- 
tack on this mode can recover an arbitrarily long key in a negligible 
amount of time which grows only linearly with its size, both for 24 and 
128 bit IV modifiers. 



1 Introduction 

RC4 is the most widely used stream cipher in software applications. It was 
designed by Ron Rivest in 1987 and kept as a trade secret until it leaked out in 
1994. RC4 has a secret internal state which is a permutation of all the iV = 2" 
possible n bits words, along with two indices in it. In practical applications n = 8, 
and thus RC4 has a huge state of log2(2^l x (2®)^) « 1700 bits. 

In this paper we analyze the Key Scheduling Algorithm (KSA) which derives 
the initial state from a variable size key, and describe two significant weaknesses 
of this process. The first weakness is the existence of large classes of weak keys, 
in which a small part of the secret key determines a large number of bits of 
the initial permutation (KSA output). In addition, the Pseudo Random Gen- 
eration Algorithm (PRGA) translates these patterns in the initial permutation 
into patterns in the prefix of the output stream, and thus RG4 has the undesir- 
able property that for these weak keys its initial outputs are disproportionally 
affected by a small number of key bits. These weak keys have length which is 
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divisible by some non-trivial power of two, i.e., £ = 2‘>m for some q > 0^. When 
RC4„ uses such a weak key of £ words, fixing n + q{£ — 1) + 1 bits of AT (as a 
particular pattern) determines 0{qN) bits of the initial permutation with prob- 
ability of one half and determines various prefixes of the output stream with 
various probabilities (depending on their length). 

The second weakness is a related key vulnerability, which applies when part 
of the key presented to the KSA is exposed to the attacker. It consists of the 
observation that when the same secret part of the key is used with numerous 
different exposed values, an attacker can rederive the secret part by analyzing 
the initial word of the keystreams with relatively little work. This concatena- 
tion of a long term secret part with an attacker visible part is a commonly used 
mode of RC4, and in particular it is used in the WEP (Wired Equivalent Pri- 
vacy) protocol, which protects many wireless networks. Our new attack on this 
mode is practical for any key size and for any modifier size, including the 24 bit 
recommended in the original WEP, and the 128 bit recommended in the revised 
version WEP 2. 

The paper is organized in the following way: In Section 2 we describe RC4 
and previous results about its security. In Section 3 we consider a slightly mod- 
ified variant of the Key Scheduling Algorithm, called KSA*, and prove that a 
particular pattern of a small number of key bits suffices to completely determine 
a large number of state bits. Afterwards, we show that this weakness of KSA*, 
which we denote as the invariance weakness, exists (in a weaker form) also in 
the original KSA. In Section 4 we show that with high probability, the patterns 
of initial states associated with these weak keys also propagate into the first 
few outputs, and thus a small number of weak key bits determine a large num- 
ber of bits in the output stream. In Section 5 we describe several cryptanalytic 
applications of the invariance weakness, including a new type of distinguisher. 
In Sections 6 and 7 we describe the second weakness, which we denote as the 
IV weakness, and show that a common method of using RC4 is vulnerable to 
a practical attack due to this weakness. In Section 8, we show how both these 
weaknesses can separately be used in a related key attack. In the appendices, we 
examine how the IV weakness can be used to attack a real system (appendix A) , 
how the invariance weakness can be used to construct a ciphertext-only distin- 
guisher and to prove that RC4 has low sampling resistance (appendices B and 

C) , and how to derive the secret key from an early permutation state (appendix 

D) . 

2 RC4 and Its Security 

2.1 Description of RC4 

RC4 consists of two parts (described in Figure 1): A key scheduling algorithm 
KSA which turns a random key (whose typical size is 40-256 bits) into an initial 

^ Here and in the rest of the paper I is the number of words of K, where each word 
contains n bits. 
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KSA(K) 


PRGA(K) 


Initialization: 


Initialization: 


For i = 0 ... A — 1 


i = 0 


S'[i] = i 


i = o 


i = o 


Generation loop: 


Scrambling: 


i = i + 1 


For i = 0 ... A — 1 


j =j + S'H 


j = j + S'fi] -|- K[i mod t] 


Swap(S[{], S[j]) 


Swap{S[i], S[j\) 


Output 2 = SlSIi] -|- Slji]] 



Fig. 1. The Key Scheduling Algorithm and the Pseudo-Random Generation Algorithm 



permutation S of {0 , . . . ,N — 1}, and an output generation part PRGA which 
uses this permutation to generate a pseudo-random output sequence. 

The PRGA initializes two indices i and j to 0, and then loops over four 
simple operations which increment i as a counter, increment j pseudo randomly, 
exchange the two values of S pointed to by i and j, and output the value of S 
pointed to by 5'[i] -I- Note that every entry of S is swapped at least once 

(possibly with itself) within any N consecutive rounds, and thus the permutation 
S evolves fairly rapidly during the output generation process. 

The KSA consists of N loops that are similar to the PRGA round operation. 
It initializes S to be the identity permutation and i and j to 0, and applies the 
PRGA round operation N times, stepping i across S, and updating j by adding 
S'[f] and the next word of the key (in cyclic order). 



2.2 Previous Attacks on RC4 

Due to the huge effective key of RG4, attacking the PRGA seems to be infea- 
sible (the best known attack on this part requires time that exceeds 2"^°°). The 
only practical results related to the PRGA deal with the construction of dis- 
tinguishers. Fluhrer and McGrew described in [FMOO] how to distinguish RG4 
outputs from random strings with 2^*^ data. A better distinguisher which re- 
quires 2® data was described by Mantin and Shamir in [MSOl]. However, this 
distinguisher could only be used to mount a partial attack on RG4 in broadcast 
applications. 

The fact that the initialization of RG4 is very simple stimulated considerable 
research on this mechanism of RG4. In particular, Roos discovered in [Roo95] a 
class of weak keys that reduces their effective size by five bits, and Grosul and 
Wallach showed in [GWOO] that for large keys whose size is close to N words, 
RG4 is vulnerable to a related key attack. 



2 



Here and in the rest of the paper all the additions are carried out modulo N 
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More analysis of the security of RC4 can be found in [KMP+98], [Gol97] and 
[MT98]. 

3 The Invariance Weakness 

Due to space limitations we prove here the invariance weakness only for a sim- 
plified variant of the KSA, which we denote as KSA* and describe in Figure 2. 
The only difference between them is that KSA* updates i at the beginning of 
the loop, whereas KSA updates i at the end of the loop. After formulating and 
proving the existence of this weakness in KSA*, we describe the modifications 
required to apply this analysis to the real KSA. 

3.1 Definitions 

We start the round numbering from 0, which means that both KSA and KSA* 
have rounds 0, . . . , iV — 1. We denote the indices swapped in round r by ir and jr, 
and the permutation S after swapping these indices is denoted as Sr- Notice that 
by using this notation, ^ = r in the real KSA. However, in KSA* this notation 
becomes somewhat confusing, when ir = r +1. For the sake of completeness, we 

can say that j_i = 0, S'-! is the identity permutation and z_i = 

Definition 1. Let S he a permutation o/ {0, . . . , iV — 1}, t he an index in S and 

b be some integer. Then if S[t] = t, the permutation S is said to 6-conserve 
the index t. Otherwise, the permutation S is said to 6-unconserve the index t. 

Definition 2. A permutation S of {0, . . . , N — 1} is 6-conserving if Ib{S) = N, 
and is almost 6-conserving if I i,{S) > N — 2. 



I -1 KSA 
[ 0 KSA* ■ 



KSA(K)“ 


KSA*(K) 


For i = 0 . . . N — 1 


For i = 0 . . . N — 1 


Sfi] = i 


S[i] = i 


i = 0 


i = 0 


j = 0 


j = o 


Repeat N times 


Repeat N times 


j = j + Sli] -1- K[i mod €\ 


i = i + 1 


S'wap(S'[i],S'[j]) 


j = j + S'[i] -|- K[i mod l\ 


i = i-\-\ 


Swap{S[i], S[f\) 


“ KSA is rewritten in a way which clarifies its relation to KSA* 



Fig. 2. KSA vs. KSA^ 
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We denote the number of indices that a permutation 6-conserves as /&(£'). To 
simplify the notation, we often write Ir instead of Ib{Sr)- 

Definition 3. Let b, £ be integers, and let K be an £ word key. Then K is called 
a 6-exact key if for any index r, K[r mod £] = (1 — r) (mod b). In case K[Q] = 1 
and MSB{K[1\) = 1, K is called a special 6-exact key. 

Notice that for this condition to hold, it is necessary (but not sufficient) that 

b I £. 

3.2 The Weakness 

dc f 

Theorem 1. Let q < n and £ be integers and b = 2‘^. Suppose that b \ £ and 
let K be a b-exact key of £ words. Then the permutation S = KSA*{K) is 
b-conserving. 

Before getting to the proof itself, we will prove an auxiliary lemma 
Lemma 1. If ij.+i = jr+i (mod b), then Ir+i = Ir- 

Proof. The only operation that might affect S (and maybe I) is the swapping 
operation. However, when i and j are equivalent ( mod b) in round r -|- 1, iSr+i 
6-conserves position v+i (jV+i) if and only if Sr 6-conserved position jr (ir). 
Thus the number of indices S 6-conserves remains the same. 

Proof, (of Theorem 1) We will prove by induction on r that for any — 1 < r < 
N— 1, it turns out that ir = jr (mod 6) and Ib{Sr) = N and . This in particular 
implies that In-i = N, which makes the output permutation 6-conserving. 

For r = — 1 (before the first round), the claim is trivial because z_i = j~i =0 
and S-i is the identity permutation which is 6-conserving for every 6. Suppose 
that jr = ir and Sr is 6-conserving. Then ir+i = ir + £ and 

Jr-\-\ — Jr T r 1 ] T niod ^ %r (f ^r-|-l) — T f — 

Thus, ir+i = jr+i (mod 6) and by applying Lemma 1 we get Ir+i = Ir = N 
and therefore S'r-i-i is 6-conserving. 

KSA* thus transforms special patterns in the key into corresponding pat- 
terns in the initial permutation. The fraction of determined permutation bits is 
proportional to the fraction of fixed key bits. For example, applying this result 
to RC4„=8.£=6 and q = 1, 6 out of the 48 key bits completely determine 252 
out of the 1684 permutation bits (this is the number of bits encapsulated in the 
LSBs). 

3.3 Adjustments to KSA 

The small difference between KSA* and KSA (see Figure 2) is essential in that 
KSA, applied to a 6-exact key, does not preserve the equivalence ( mod 6) of i 
and j even after the first round. Analyzing its execution on a 6-exact key gives 

mod b b 

jo = j-i + >S'-i[zo] + = 0 -l- S'_i[0] -l- K[0] = A[0] = 1 ^ 0 = to 
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and thus the structure described in Section 3.2 cannot be preserved by the cyclic 
use of the key words. However, it is possible to adjust the invariance weakness 
to the real KSA, and the proper modifications are formulated in the following 
theorem: 

dc. f 

Theorem 2. Let q < n and I he integers and b = 2‘^. Suppose that b \ I and let 
K he a special b-exact key of i words. Then 

Pr[KSA{K) is almost b-conserving] > 2/5 

where the probability is over the rest of the key hits. 

Due to space limitations, the formal proof of this theorem (which is based 
on a detailed case analysis) will appear only in the full version of this paper. 
However, we can explain the intuition behind this theorem by concentrating on 
the differences between Theorems 1 and 2, which deal with KSA* and KSA 
respectively. During the first round, two deviations from KSA* execution oc- 
cur. The first one is the non-equivalence of i and j which is expected to cause 
non-equivalent entries to be swapped during the next rounds, thus ruining the 
delicate structure that was preserved so well during KSA* execution. The sec- 
ond deviation is that S 5-unconserves two of the indices, Iq = 0 and jo = K[0]. 
However, we can cancel the ij discrepancy by forcing K[0] (and jo) to 1. In this 
case, the discrepancy in >S'[jo] (S')!]) causes an improper value to be added to j in 
round 1, thus repairing its non-equivalence to i during this round. At this point 
there are still two unconserved indices, and this aberration is dragged across 
the whole execution into the resulting permutation. Although these corrupted 
entries might interfere with j updates, the pseudo-random j might reach them 
before they are used to update j (i.e., before i reaches them), and send them 
into a region in S where they cannot affect the next values of j^. The proba- 
bility of this lucky event is amplified by the fact that the corrupted entries are 
to = 0 which is not touched (by i) until the termination of the KSA due to its 
distance from the current location of i, and ji = 1 -I- K[l] > N/2 (recall that 
MSB{K[1]) = 1), that is far the position of i {i\ = 1), which gives j many 
opportunities to reach it before i does. The probability of N/2 pseudo random 
j’s to reach an arbitrary value can be bounded from below by 2/5, and extensive 
experimentation indicates that this probability is actually close to one half. 

4 Key-Output Correlation 

In this section we will analyze the propagation of the weak key patterns into the 
generated outputs. First we prove Claim 4 which deals with the highly biased 
behavior of a significantly weakened variant of the PRGA (where the swaps are 
avoided), applied to a 5-conserving permutation. Next, we will argue that the 

® if a value is pointed to by j before the swap, it will not be used as S[i] (before the 
swap) for at least N — 1 rounds, and in particular it will not affect the values of j 
during these rounds. 
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prefix of the output of the original PRGA is highly correlated to the prefix of 
this swapless variant, when applied to the same initial permutation. These facts 
imply the existence of biases in the PRGA distribution for these weak keys. 

Claim. Let RG4* be a weakened variant of RG4 with no swap operations. Let 
q < n, b ‘^= 2^ and 5'o^ be a 6-conserving permutation. Let be the 

dc. f 

output sequence generated by applying RG4* to S'o; and Xr = mod b. Then 
the sequence is independent of the rest of the key bits. 

Since there are no swap operations, the permutation does not change and 
remains 6-conserving throughout the generation process. Notice that all the val- 
ues of S are known mod b, as well as the initial indices t = j = 0 = 0 (mod b ) , 
and thus the round operation (and the output values) can be simulated mod 6, 
independently of S. Gonsequently the output sequence mod6 can be predicted, 
and deeper analysis implies that it is periodic with period 26, as exemplified in 
Figure 3 for q = 1. 
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S[i] 


5[j] 


S[i\ + S'[j] 


Out 
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0 
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1 


1 


0 


0 



Fig. 3. The rounds of RC4*, applied to 
a 2-conserving permutation 




Fig. 4. The stage in which each one of 
the bits is exposed during the related key 
attack 



Recall that at each round of the PRGA, S changes in at most two locations, 
and thus we can expect the prefix of the output stream generated by RG4 from 
some permutation Sq, to be highly correlated with the stream generated from 
the same Sq (or a slightly modified one) by RG4*. In particular the stream 
generated by RG4 from an almost 6-conserving permutation is expected to be 
highly correlated with the (predictable) substream {a;^} from Glaim 4. This 
correlation is demonstrated in Figure 8, where the function h — > Pr[l < Vr < 
h Zr = Xr mod 2"?] (for special 2'^-exact keys) is empirically estimated for n = 8, 
£ = 16 and different q’s. For example, a special 2-exact key completely determines 
20 output bits (the LSBs of the first 20 outputs) with probability 2“^-^ instead 
of 2“^°, and a special 16-exact key completely determines 40 output bits (4 LSBs 
from each of the first 10 outputs) with probability 2“^-^, instead of 2“"^°. 

^ The term So is used here for the common purpose of indicating the initial permuta- 
tion of the PRGA. 
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We have thus demonstrated a strong probabilistic correlation between some 
bits of the secret key and some bits of the output stream for a large class of weak 
keys. In the next section we describe how to use this correlation to cryptanalyze 
RC4. 

5 Cryptanalytic Applications of the Invariance Weakness 

5.1 Distinguishing RC4 Streams from Randomness 

In [MSOI] Mantin and Shamir described a significant statistical bias in the sec- 
ond output word of RC4. They used this bias to construct an efficient algorithm 
which distinguishes between RC4 outputs and truly random sequences by ana- 
lyzing only one word from 0{N) different outputs streams. This is an extremely 
efficient distinguisher, but it can be easily avoided by discarding the first two 
words from each output stream. If these two words are discarded, the best known 
distinguisher requires about output words (see [FMOO]). Our new observation 
yields a significantly better distinguisher for most of the typical key sizes. The 
new distinguisher is based on the fact that for a significant fraction of keys, a 
significant number of initial output words contain an easily recognizable pattern. 
This bias is flattened when the keys are chosen from a uniform distribution, but 
it does not completely disappear and can be used to construct an efficient dis- 
tinguisher even when the first two words of each output sequence are discarded. 

Notice that the probability of a special 2‘?-exact key to be transformed into 
a 2'^-conserving permutation, does not depend of the key length £ (see Theorem 
2). However, the number of predetermined bits is linear in £, and consequently 
the size of this bias (and thus the number of required outputs) also depends 
on £. In Figure 5 we specify the quantity of data (or actually the number of 
different streams) required for a reliable distinguisher, for different key sizes. In 
particular, for 64 bit keys the new distinguisher requires only 2^^ data instead 
of the previously best number of 2^*^ output words. 

It is important to notice that the specified output patterns extend over several 
dozen output words, and thus the quality of the distinguisher is almost unaffected 
by discarding the first few words. For example, discarding the first two words 
causes the data required for the distinguisher to grow by a factor of between 
2°-® and 2^ (depending on £). Another important observation is that the biases 
in the LSBs distribution can be combined in a natural way with the biased 
distribution of the LSBs of English texts into an efficient distinguisher of RC4 
streams from randomness in a ciphertext-only attack in which the attacker does 
not know the actual English plaintext which was encrypted by RC4. This type 
of distinguishers is discussed in Appendix B. 



5.2 RC4 Has Low Sampling Resistance 

Biryukov, Shamir and Wagner defined in [BSWOO] a new security measure of 
stream ciphers, which they denoted as their Sampling Resistance. The strong 
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^ number of determined output bits 

probability of these fci key bits to determine these /c 2 output bits (taken from Figure 8) 

d _ 2~^2 

~ Prnd + 

Fig. 5. Data required for a reliable distinguisher, for different key sizes 



correlation between classes of RC4 keys and corresponding output patterns can 
be used to prove that RC4 has relatively low sampling resistance, which improves 
the efficiency of time/memory/data tradeoff attacks. Further details can be found 
in Appendix C. 



6 RC4 Key Setup and the First Word Output 

In this section, we consider related key attacks where the attacker has access to 
the values of all the bits of certain words of the key. In particular, we consider 
the case where the key presented to the KSA is made up of a secret key concate- 
nated with an attacker visible value (which we will refer to as an Initialization 
Vector or IV). We will show that if the same secret key is used with numerous 
different initialization vectors, and the attacker can obtain the first word of RC4 
output corresponding to each initialization vector, he can reconstruct the secret 
key with minimal effort. How often he can do this, the amount of effort and the 
number of initialization vectors required depends on the order of the concate- 
nation, the size of the IV, and sometimes on the value of the secret key. This 
observation is especially interesting, as this mode of operation is used by several 
deployed encryption systems ([ReiOl], [LMSon]) and the first word of plaintexts 
is often an easily guessed constant such as the date, the sender’s identity, etc, 
and thus the attack is practical even in a ciphertext-only mode of attack. How- 
ever, the weakness does not extend to the Secure Socket Layer (SSL) protocol 
that browsers use, as SSL uses a cryptographic hash function to combine the 
secret key with the IV. 





10 



Scott Fluhrer, Itsik Mantin, and Adi Shamir 



In terms of keystream output, this attack is interested only in the first word 
of output from any given secret key and IV. Hence, we can simplify our model 
of the output. The first output word depends only on three specific permutation 
elements, as shown in the figure below showing the state of the permutation 
immediately after KSA. When those three words are as shown, the value labeled 
Z will be output as the first word. 



1 V X + D 





A 








D 






Z 





In addition, we will define the resolved condition as any time within the 
KSA where i is greater than or equal to 1, X and Y, where X is defined as 
and Y is defined as A + 5i[A] (that is, X + D). When this resolved condition 
occurs, with probability greater than e~^ « 0.05, none of the elements 5'[1], 
S'[A], S'[V] will participate in any further swaps^. In that case, the value will be 
determined by the values of S^l], 5'i[A] and S'i[V]®. With probability less than 
1 — e~^ « 0.95, at least one of the three values will participate in a swap, which 
will destroy the resolved condition and set that element to an effectively random 
value. This will make the output value effectively random. Our attack involves 
examining messages with specific IV values such that, at some point, the KSA 
is in a resolved condition, and where the value of 5'[V] gives us information on 
the secret key. When we observe sufficiently many IV values, the actual value of 
S'[V] occurs detectably often. 

7 Details of the Known IV Attack 

Whenever we discuss a concatenation of an IV and a secret key, we denote the 
secret key as SK, the size of the IV by I, and the size of SK as £—1. The variable 
K still represents the RC4 key, which in this case is the concatenation of these 
two (e.g. in section 7.1 K[1 . . . £] = IV[0] . . . IV[I — 1]S'A'[0] . . . SK[£ — 1 — /]). 
The numbering of the rounds, as well as the terms A, jr and Sr are as defined 
in section 3.1. 



7.1 IV Precedes the Secret Key 

First consider the case where the IV is prepended to the secret key. In this circum- 
stance, assuming we have a known I word IV, and a secret key (S'A'[0] . . . SK[£ — 
1 — /]), we attempt to derive information on a particular word B of the secret 
key {SK[B] or K[I+B]) by searching for IV values such that after round I (that 
is after I +l rounds), S'/)!] < I and S/[l] -I- S/[S/[1]] = I + B. Then, with high 
likelihood (probability « if we model the intermediate swaps as random), 

® In our case we assume that c « 1 (since i is small), that the remaining swaps in the 
key setup touch words with random j’s, and that the three events are independent. 
® And, in particular, if 1, X, Y are mutually distinct, then Si[V] will be output as the 
first word. 
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we will be in a resolved condition after round I + B, and so the most probable 
output value will be Si+b[I + B], We further note that, at round I + B, the 
following assignments will take place: 

ji+B = ji+B-i + K{B] + Si+B-l[I + B] 

Si+b[I + B] = Si+B-l[jl+B] 

Using algebra, we see that if we know the value of jj+B-i and Sj+b-i, then 
given the first output word (which we will designate Out), we can make the 
probabilistic assumption that Out = Si+b[I + B], and then predict the value 
based on the assumption: 

K[B] = Sy^^_j^[Out] — ji+B-1 — Si+B-i[I + B] 

where denotes the location within the permutation Sr where the value 

V appears. Since Out = Si+b[I + B] more than 5% of the time, this prediction 
is accurate that often, and effectively random less than 95% of the time. By 
collecting sufficiently many values from different IVs, we can reconstruct K[B], 
In the simplest scenario (3 word chosen IVs), the attack works as follows^: 
suppose that we know the first A words of the secret key (AT [3], . . . ,K[A + 2], 
with A = 0 initially), and we want to know the next word ^"[^ + 3]. We examine 
a series of IVs of the form (A + 3, — 1, U) for approximately 60 different values 

for V. At the first round, j is advanced by A + 3, and then S'[i] and S'),;'] are 
swapped, resulting in the key setup state which is shown schematically below, 
where the top array is the combined IV and secret key presented to the KSA, 
and the bottom array is a portion of the permutation, and where the positions 
of the i, j variables are indicated. 



A + 3 A^-I 


V 


K[S\ 


K[A + 3] 


0 I 


2 




A + 3 


A + 3 1 


2 




0 


*0 






jo 



Then, on the next round, i is advanced, and then the advance on j is computed, 
which happens to be 0. Then, S'[f] and S[j] are swapped, resulting in the below 
structure: 



A + 3 


N- I 


V 


K[S\ 




K[A + 3] 




0 I 2 A+3 


A + 3 


0 


2 






I 





ii ji 



Then, on the next round, j is advanced by U + 2, which implies that each 
distinct IV assigns a different value to j, and thus beyond this point, each IV 
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This scenario was first published by Wagner in [Wag95]. 
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acts differently, approximating the randomness assumption made above. Since 
the attacker knows the value of V and AT [3], ... AT [A + 2], he can compute the 
exact behavior of the key setup until before round A + 3. At this point, he knows 
the value of j'a+2 and the exact values of the permutation Sa+2 ■ If the value at 
S'a+ 2[0] or S'a+ 2[1] has been disturbed, the attacker discards this IV. Otherwise, 
j is advanced by S'a+ 2 [*a+ 3 ] + K[A + 3], and then the swap is done, resulting in 
the below structure: 



A + 3 


N-1 


V 


K[3] 




K[A + 3] 




0 I 2 A+3 


A + 3 


0 


5[2] 






<5'a+3[A + 3] 





*A+3 



The attacker knows the permutation Sa+2 and the value of ja+ 2 - In addition, if 
he knows the value of S'^+ 3 [A + 3], he knows its location in Sa+2, which is the 
value of jA+ 3 , and hence he would be able to compute K[A + ‘i\. We also note 
that lA+z has now swept past 1, S'^+ 3 [l] and S'a+ 3 [ 1 ] + S'a+3[5'a+ 3[1]], and thus 
the resolved condition exists, and hence with probability p > 0.05, by examining 
the value of the first word of RC4 output with this IV, the attacker will be able 
to compute the correct value of K[A + 3]. Hence, by examining approximately 
60 IVs with the above configuration, the attacker can rederive K[A + 3] with a 
probability of success greater than 0.5. 

By iterating the above process across the secret key, the attacker can rederive 
^ words of secret key using 60£ chosen 3 word IVs. 

The next thing to note is that the attack works for IVs other than those in 
the specific {A + 3, N — 1, V) form. Any I word IV that, after / rounds, leaves 
S'/[l] < I and 5'/[l] + 5'7[5'/[l]] = I + B will suffice for the above attack. In 
addition, since the attacker is able to simulate the first / rounds of the key 
setup, he is able to determine which IVs have this property. By examining all 
IVs that have this property, we can extend this into a known IV attack, without 
using an excessive number of IVs®. The probabilities to find the next word, and 
the expected number of IVs needed to obtain 60 IVs of the proper form, are 
given in Figure 6. 

7.2 IV Follows the Secret Key 

In the case that the IV is appended to the secret key, we need to take a different 
approach. The previous analysis attacked individual key words. When the IV 
follows the secret key, what we do instead is select IVs that give us the state of 

® Note that different IVs that lead to the same intermediate values of j, are not 
properly modeled by our random swap model. It is possible that specific values of j 
will suggest specific incorrect keyword values, iudepeudently of the actual IV words. 
One way to overcome this difficulty, is to take only IVs which induce distinct values 
of j. An alternative approach is to try all the high probability key words in parallel, 
instead of concentrating only on the most probable one. 
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IV Length 


Probability 


Expected IVs required 
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4.57 X 10"® 


1310000 


4 


4.50 X 10"® 


1330000 


5 


1.65 X 10"^ 


364000 


6 


1.64 X 10"‘‘ 


366000 


7 


2.81 X 10“‘‘ 


213000 


8 


2.80 X lO"'^ 


214000 


9 


3.96 X 10-^ 


152000 


10 


3.94 X 10"‘‘ 


152000 


11 


5.08 X 10"‘‘ 


118000 


12 


5.04 X 10"‘‘ 


119000 


13 


6.16 X lO"'^ 


97500 


14 


6.12 X 10-^ 


98100 


15 


7.21 X 10“‘‘ 


83200 


16 


7.18 X 10"‘‘ 


83600 



Fig. 6. For various prepended IV and known secret key prefix lengths, the probability 
that a random IV will give us information on the next secret key word, and the expected 
number of I Vs required to derive the next secret key word. 



the permutation at an early phase of the key setup, such as immediately after 
all the words of the secret key have been used for the first time. Given that only 
a few swaps have occurred up to that point, it is reasonably straight-forward to 
reconstruct those swaps from the permutation state, and hence obtain the secret 
key (see Appendix D for one such method). 

To illustrate the attack in the simplest case, suppose we have an A word 
secret key, and a 2 word IV. Further suppose that the secret key was weak in 
the sense that, immediately after A rounds of KSA, = X, X < A, and 

X + = A. This is a low probability event {p « 0.00062 ii A = 13)®. For 

such a weak secret key, the attacker can assume the value of Ja-i + 
and then examine IVs with a first word of IF = V — {Ja-i + «5'a-i[^]) (this 
assumption does increase the amount of work by a factor of N, and forces us 
to verify the assumption, which we can do by observing a consistent predicted 
value of Sa-i)- With such IVs, the value of Ja will be the preselected value V. 
Then, S'[A] and S'[F] are swapped, and so = S'a-i[F]. Here, assuming 

V was neither 1 nor S'/!-!)!], then the resolved condition has been established, 
and with probability > 0.05, 5'^_i[F] will be the first word output. Then, by 

® A straightforward assumption that the permutation Sa-i is equidistributed gives 
a much lower probability 13/256 x 1/256 « 0.00020, however, Sa-i is not equidis- 
tributed; the first A bytes are biased towards small values. 
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examining such IVs with the second word being at least 60 different values, we 
can observe the output a number of times and derive the value of 5 'a-i[V^] with 
good probability. By selecting all possible values of V, we can directly observe 
the state of the 5 'a-i permutation, from which we can rederive the secret key. 
We will denote this result as key recovery. 

If X + = A + 1, a, similar analysis would appear to apply. By 

assuming S'yi_i[A+l] ondjA-i, we can swap into S'a+i[41+1] 

for N — 2 distinct IVs for any particular V. However, the value of ja+i is always 
the same for any particular V , and so the probabilities that a particular IV 
outputs the value S'[V] are not independently distributed. This effect causes 
the reading of the permutation state to be ’noisy’, that is, for some values of 
V, we see S'[V] as the first word far more often than our analysis expected, 
and for other values of V, we see it far less often. Because of this, some of the 
entries S'yi_i[V] cannot be reliably recovered. Simulations assuming a 13 word 
secret key and n = 8 have shown that an average of 171 words of the S'a-i 
permutation state can be successfully reconstructed, including an average of 8 
words of (S'/i_i[0], . . . , S'^_i[12]), which immediately give you effectively 8 key 
words. With this information, the key is reduced enough that it can be brute 
forced. We will denote this result as key reduction. 

If we have a 3 word IV, then there are more types of weak secret keys. For 
example, consider a secret key where S'yi_i[l] = 1 and = A. Then, by 

assuming ja-i, we can examine IV where the first word has a value W so that 
the new value of j'a is 1, and so 5'^_i[l] and S'a-i[TI] are swapped, leaving the 
state after round A to be: 





SK[1] 




SK[A - 1] 
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JV[1] 


IV[2\ 
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A- 1 
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A + 1 


A + 2 


^A-i[0] 


A 
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1—^ 


1 


Sa-i[A + 1] 


Sa-i[A + 2] 



JA iA 



Then, by assuming + 1] (which with high probability is A + 1, and 

will always be at most A+ 1), we can examine IVs with the second word IV\1] = 
V — (1 + + 1]), for an arbitrary V, which will cause ja+i = V and swap 

the value of S'yi_i[V] into S'^+i[H+ 1]. Assuming V isn’t either 1 or A, then the 
resolved condition have been set up, and using a number of values for the third 
IV word Z, we can deduce the value of for an arbitrary V, giving us 

the permutation after A rounds. 

There are a number of other types of weak keys that the attacker can take 
advantage of, summarized in Figure 7. 

The last weak secret key listed in Figure 7 is especially interesting, in that 
the technique that exposes the weakness is rather different than that of the other 
weak secret keys listed. Immediately after A rounds, the state is: 
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Key recovery 


5a-i[1]=V<A 

Sa-i[X]+X=A+2 


Cycle 


Cycle 


Swap with Y 
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Key recovery 
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Fig. 7. Weak secret keys with 3 word postfix TVs. Listed are the conditions on the 
Sa -1 permntation that distinguish them, the IV properties that the attacker searches 
for to reveal S'fV], the probability that this class of weak key will occur with n — 8 
and a 16 word secret key, and the result of the attack on the weak key. 
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The initial IV word causes 5'yi_i[V] and to be swapped, leaving the 

state as: 





SK[1] 




SK[V] 
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S'a-iItI + 1] 





jA 



Now, to inquire about the value of Sv+z[W + Const], we examine numerous 
IVs with second and third words that all set the value of Ja +2 to be W. The 
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KSA will continue for V + Z — (A + 2) more rounds until i now points to the 
element Sv+z\V + Z\. At this point, since we haven’t gone through a great 
number of rounds since we knew the value of j (since V + Z — {A + 2) < A — 4), 
then with high probability, jv+z+i = W + Const, where Const is a constant 
term that depends only on the state of the permutation Sa- If this is true, then 
Sv+z+i\V + Z] = Sv+z\W + const], and if the elements S'[l] and have 
not been disturbed (again, this happens with high probability), the resolved 
condition has been achieved, and the first output word will be biased towards 
Sv+z\W + const]. In addition, because the value of const will be the same 
independent of W, its value can easily be determined, thus allowing the attacker 
to observe many of the values of Sy+z- This class of weak keys requires far more 
known I Vs to exploit, but also occurs relatively frequently. 

If we have a 4 word^*^ IV, then the same general approach as the previous 
analysis can be used to recover virtually all secret keys, given sufficient IVs. First, 
we assume jA- 1 , S'a- i [ A] , S'a- i [A + 1] , 5 ^- 1 [A + 2] , S'a- i [A + 3] ^ ^ . Then, based 
on this assumption, we search for IVs that, after round A + 3, sets 5'^+3[l] = V 
and 5'^+3[V] = Z for V,Z < A + 4, V + Z > A + 4, and we note the value of 
Ja+3 = W- Then, we save the value ofV+Z, the value W and the value output 
as the first word for that particular IV. With nontrivial probability, the value of 
this word will be Sy+z]^ + consty+z], where consty+z is a constant term that 
depends on the secret key, and the value V + Z. Since that value is independent 
of the IV, we can collect numerous possible values of Sy+z[W + consty+z] for 
various values of V + Z, and use that to first reconstruct consty+z, and then 
reconstruct Sy+z- 

8 Related-Key Attacks on RC4 

In this section, we discuss two related-key attacks based on weaknesses discussed 
previously in this paper. They work within the following model: the attacker is 
given a black box that has a randomly chosen RC4 key K inside it, an output 
button and an input tape of |AT| words. In each step the attacker can either press 
the output button to get the next output word, or write A on the tape, which 
causes the black-box to restart the output generation process with a new key 
defined as K' = K ® A. The purpose of the attacker is to find the key K (or 
some information about it). 

8.1 Related-Key Attack Based on the Invariance Weakness 

This attack works when the number of key words, is a power of two. It consists of 
n stages where in stage q the bit of every key word is exposed^^. The predicate 
CheckKey takes as input an RC4 blackbox and a parameter q (the stage number) 
and decides whether the key in the box is special 2*-exact. This purpose can be 

This approach generalizes in the obvious way to longer IVs. 

Note that S'a-i)®] < x for x > A. This limits the size of the search required. 

In fact, A[l] is fully revealed during the first stage (see Figure 4) 




Weaknesses in the Key Scheduling Algorithm of RC4 



17 



achieved by randomly sampling key bits that are irrelevant for the 2*-exactness of 
the key and estimating the expected length of g-patterned output. For a special 
2'^-exact key the expected length will be significantly longer than in a random 
output (where it is less than 2) and thus CheckKey works in time 0(1). The 
procedure Expand takes as input an RC4 blackbox and a parameter q (the stage 
number), assumes that the key in the box is special 2^“^-exact, and makes it 
special 2‘^-exact. The method for doing so is by enumerating all the possibilities 
for the bits (2^“^ such possibilities) and invoking CheckKey to decide when 
the key in the box is special 2'^-exact. Expand works in a slightly different way 
for <7=1 and q = n. For q = I, except for the LSBs, it determines the complete 
KT[0] (by forcing it to 1) and MSB{K[1]). For q = n, there is only one 2"-exact 
key and consequently we can calculate the output produced from this key and 
replace CheckKey by simple comparison. The time complexity of this stage is 
0(2”+^) for <7 = 1 and 0(2^“^) for any other q. 

The total time required for the attack is thus 0(2”+^) + (n — 1)0(2^) = 
0(2”+^). For typical RC4„=8 key with 32 bytes, the complexity of exhaustive 
search is completely impractical (2^^®), whereas the complexity of the new attack 
is only 0(2”+^) = 

8.2 Related-Key Attack Based on the Known IV Weakness 

In this section we use the known IV weaknesses to develop an efficient related 
key attack on RC4. 

The attack consists of 3 stages, where in the first two stages we gain informa- 
tion on the first three words of the secret key, and in the third stage we iterate 
down the key, and expose each word of the key successively. The stages of the 
attack are as follows: 

Step 1. This step attempts to find values of A[0], K[l] such that S'!)!] = 1, 
and reveal the value of K[2], The procedure is to select random values of 
(X,Y), and for each such random value, write onto the tape 240 vectors 
with the initial four words (X,Y,Z,W) for Z G {0, N/4, N/2, 3N/4} and 
with 60 distinct random values of IF, and for each such vector, press the 
output button. If X and Y are such that S'!)!] = 1 (for the modified key), 
then the output of the first word will be biased towards 3 + (K[2]0Z), unless 
that value happens to be 1. Hence, for at least 3 of the selected values of 
Z, the first word outputs will be biased towards one of const, const + N/4, 
const + N/2, const + 3N/4. This is detectable, and also by examining the 
value of const, the attacker can reconstruct the value of K[2], We expect to 
try N random values of {X, Y) before finding a pair that is appropriate. 
Step 2. This step attempts to find the values of K[0], K[l]. The procedure is to 
write on the tape 60 vectors with the initial four words {X, Y, Z, IF), where 
X, Y are the values recovered in the previous step, Z = {N — 3) (B K\2], 
and with 60 distinct random values of IF, and for each such vector, press 
the output button. This particular initial sequence assures that 5*2 [1] = 1 
and 5'2[2] = 5'i[0] = K[Q], and hence the output will be biased towards K[0]. 
Once that has been recovered, K[l] can be computed. 
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Step 3. This step iteratively recovers individual words of the key. It operates 
by running a subprocedure that assumes that we have already recovered 
(A'[0], . . . , K\A — 1]), and want to learn the value of Ar[A]. The procedure is 
to write 60 vectors that have the property that, given the known values of 
{K[%...,K[A- 1]), that Sa-i[1] = X < A and X + Sa-i[X] = A. With 
60 such vectors, we can use the procedure shown in 7.1 to rederive AT[A]. 

The total time required for the attack is thus (because 2" > £): 

Stepl + Step2 + (£ - 3) * Step3 = 0(2”+®) + 2® + (£ - 3)2® = 0(2”+®) 

For a RC4 key with n = 8 the time complexity is 0(2^®) and is essentially 
independent of the key length. 



8.3 Comparing the Attacks 

Both attacks are able to completely reconstruct the randomly chosen RC4 key^® 
with a number of chosen keys and amount of work that is significantly below 
that of brute force (except for extremely short RC4 keys). The first attack scales 
upwards as the key grows longer, while the time complexity of the second attack 
is independent of key length, with a cross-over point at £ = 8. 

However, due to the second word weakness, future implementations of RC4 
are likely to discard some prefix of the output stream, and in this case the second 
attack becomes difficult to apply - output word x depends on 2a; -1-1 permutation 
elements immediately after KSA, and all the 2a; -I- 1 elements must occur before 
r for the resolved condition to hold. On the other hand, the first attack extends 
well, in that the probability of the output words being patterned drops modestly 
as the number of discarded words increases. 



9 Discussion 

Section 3 describes an interesting weakness of RC4 which results from the sim- 
plicity of its key scheduling algorithm. We recommend to neutralize this weakness 
by discarding the first N words of each generated stream. After N rounds, every 
element of S is swapped at least once and the permutation S and the index j 
are expected to be “independent” of the initialization process. 

Section 6 describes a weakness of RC4 in a common mode of operation in 
which attacker visible IV’s are concatenated with a fixed secret key. It is easy 
to extend the attack to other simple types of combination operators (e.g., when 
we XOR the IV and the fixed key) with essentially the same complexity. We 
recommend to neutralize this weakness by avoiding this mode of operation, or 
by using a secure hash to form the key presented to the KSA from the IV and 
secret key. 

the first attack works only for some key lengths. 
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h - size of the patterned prefix 



Fig. 8. This graph demonstrates the probabilities of special keys (2'^-exact with A[0] = 
1, M SB[K[1] = 1)) of RC4„^8^^=16 to produce streams with long patterned prefixes 



A Applying the Attack to WEP 

The Wired Equivalent Privacy (WEP) protocol is designed to provide privacy 
to packet based wireless networks based on the 802.11 standard (see [LMSon]). 
It encrypts by taking a secret key and a per-packet 3 byte IV, and using the 
IV followed by the secret key as the RC4 key. Then, it transmits the IV, and 
the RC4 encrypted payload. By using the results from Section 7.1, we can show 
how, by examining enough ciphertext packets, to reconstruct the secret key for 
WEP. 

We assume that the attacker is able to retrieve the first byte of the RC4 
output from each packet By the analysis done in section 7.1, to recover key 
byte 5, the attacker needs to know the previous key bytes, and then search for 
IVs that sets up the permutation such that 



Because of the payload format used with 802.11, the first byte of each plaintext 
payload is a known constant, and hence the attacker is able to derive the first byte 
of RC4 output. 
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F= 5b+3[1] <B + 3 



( 1 ) 



V + Sb+3[V] = B + 3 

With about 60 such IVs, the attacker can rederive the key byte with rea- 
sonable probability of success. The number of packets required to obtain that 
number of IVs depends on the exact IVs that the sender uses. Although the 
802.11 standard does not specify how an implementation should generate these 
IVs, common practice is to use a counter to generate them. 



A.l Analysis of IVs Generated by a Little Endian Counter 

If the IVs are generated by a multibyte counter in little endian order (and hence 
the first byte of the IV increments the fastest), then the attacker can search for 
IVs of the form {B, 255, V) for 3 < B < 8. If he can collect these for 60 different 
values of V, then he can derive the secret key with little work. This requires 
approximately 4,000,000 packets. 



A. 2 Analysis of IVs Generated by a Big Endian Counter 

If the IVs are generated by a multibyte counter in big endian order (and hence 
the last byte of the IV increments the fastest), then the attacker can, as above, 
search for IVs of the form (B,255,V). This requires approximately 1,000,000 
packets to collect the requisite IVs, assuming that the counter starts from zero. 

However, if the counter doesn’t start from zero, the attacker has an alter- 
native strategy available to him. He can assume the first several bytes of secret 
key, and then search for IVs that set up the permutation as in Equation 1. If 
the attacker assumes the first two bytes of secret key, then for each initial IV 
byte, there are approximately 4 settings of the remaining two bytes that set 
up the permutation as required to rederive a particular key byte. Hence, with 
approximately 1,000,000 packets, and an additional 2^® work factor, he can still 
rederive the key. 

It is common practice in the industry to extend the length of the WEP secret 
key (which is specified as 40 bit). Because the above attacks recover each key 
byte individually, the time complexity of the attack grows linearly rather than 
exponentially with the key length, and the data complexity of the attack remains 
essentially constant. Consequently, even an extremely long key is not immune to 
this attack. 

Shortly after the publication of a preliminary version of this paper, Stub- 
blefield, loannidis and Rubin ([SIROl]) implemented the attack and successfully 
derived a 128 bit WEP key, by observing the network during a single evening. 
Several optimization techniques can probably reduce the required amount of 
data, to the number of packets sent on a fully loaded network, in less than 15 
minutes. 
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B Ciphertext-Only Distinguishers 
Based on the Invariance Weakness 

The distinguishers we presented in Section 5.1, as well as most of the distin- 
guishers mentioned in the literature (for RC4 and other stream ciphers) assume 
knowledge of the plaintext in order to isolate the XORed key stream. 

However, in practice the only information the attacker has is typically some 
statistical knowledge about the plaintext, e.g., that it contains English text. 
Combining the non-random behaviors of the plaintext and the key-stream is not 
always possible, and there are cases where XORing biased streams result with 
a totally random stream (e.g. when one stream is biased in its even positions 
and the other stream is biased in its odd positions). We prove here that if the 
plaintexts are English texts, it is easy to construct a ciphertext-only distinguisher 
from our biases. The intuition of this construction is that the biases described 
in Section 5.1 are in the distribution of the LSBs, and consequently they can be 
combined with the non-random distribution of the LSBs of English texts. 

There are many major biases in the distribution of the LSBs of English texts, 
and they can be combined with biases of the key-stream words in various ways. 
In Theorem 3, we show how to combine the distribution of the first LSB of the 
RC4 output stream, with the first order statistics of English texts^^ : 

Theorem 3. Let C he the ciphertext generated by RC4 from a random key and 
the ASCII representation of plaintexts, distributed according to the first order 
statistics of English texts. Let p be the probability of a random key to he special 
2-exact. Then C can he distinguished from a random stream by analyzing the 
first few words of about ^ different RC4 streams. 

For example, for RC4„^s with 8 byte keys, p = 2“^®, which implies a reliable 
ciphertext-only distinguisher that works with less than 2"*° data. The proof of 
Theorem 3 is based on the observation that the LSB of a random English text 
character is zero with probability of about 55%. The formal proof is omitted due 
to space limitations. 

It is important to note that Theorem 3 does not use all the statistical infor- 
mation which is available in either the key-stream or the plaintext distributions, 
and consequently does not represent the best possible attack. 

C The Sampling Resistance of RC4 

Most of the Time/Memory/Data tradeoff attacks on stream ciphers are based 
on the following paradigm. The attacker keeps a database of [state, output] pairs 
(sorted by output) and lookups every subsequence of the output stream in this 
database. When a (sufficiently long) database sequence is located in the output. 

Since the purpose of the theorem is only to demonstrate this approach, we ignore 
the fact that the distribution of the first characters in an English sentence differs 
from the distribution of mid-text characters. 




22 



Scott Fluhrer, Itsik Mantin, and Adi Shamir 



the attacker can conclude that the actual state is the one stored along with this 
sequence and predict the rest of the stream. 

A drawback of this approach is that the large database must be stored in a 
hard disk(s) whose random access time is about a million times slower than a 
computational step. To improve that attack we can keep on disk only states that 
are guaranteed to produce outputs with some rare but easy recognizable property 
(e.g., starting with some prefix a). In this case only output sequences that have 
this property have to be searched in the database, and thus the expected time 
and the expected number of disk probes is significantly reduced. 

In general, producing a pair [state, output] with such a rare property costs 
much more than producing a random pair. O(^) random states are required to 
find a single pair, where p is the probability of a random output to have this prop- 
erty. However, if we can efficiently enumerate states that produce such outputs, 
the number of sampled states decreases dramatically, and this method can be 
applied without significant additional cost during the preprocessing stage. The 
sampling resistance of a stream cipher provides a lower bound on the efficiency 
of such enumeration. 

Such an attack can be applied to RC4 in two ways, based on the KSA and 
PRGA parts. An attack on the generation part constructs a database of pairs 
[RC4 state, output substring] and analyzes all the substrings along a single out- 
put stream. The database construction is very simple since it is easy to enumerate 
states which produce outputs that have some constant prefix. However, this enu- 
meration seems to be useless due to the huge effective key of this part (1684 bits) 
which makes such a tradeoff attack completely impractical. A more promising 
approach is based on the KSA part which uses a key of 40-256 bits and might be 
vulnerable to tradeoff attacks. In this case, the pairs in the database are [secret 
key, prefix of the output stream], and the attack requires prefixes from a large 
number of streams (instead of a single long stream). 

The correlation described in Section 4 provides an efficient sampling of keys 
that are more likely to produce output prefixes of the patterned type specified 
above (predictable mod h). 

For example, consider the problem of sampling M keys which are transformed 
by the KSA into streams whose first five words are fixed ( mod 16). This property 
of random streams has probability of 2~^^, and the expected number of disk 
probes during the actual attack is reduced by this factor. For stream ciphers with 
high sampling resistance, such a filter would increase the preprocessing time by 
a factor of one million, as one would have to sample a million random keys in 
order to find a single “good” key. For RC4 (due to the invariance weakness), 
the preprocessing time increases by a factor of less than four, as more than one 
quarter of the exact special keys produce such streams, which have this fixed 
pattern. Consequently, the preprocessing stage is accelerated by a factor of 2^®. 

To summarize this section, we proved that RC4 has relatively low Sampling 
Resistance, which greatly improves the efficiency of tradeoff attacks based on its 
KSA. 
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D Deriving the Secret Key 

from an Early Permutation State 

Given the values S'a [0] , . . . , S'a — 1] , one method to find all the values of 
AT[0], . . . , K[A — 1] that result in such a permutation is: 



i = j = Q 

For i = 0 . . . A — 1 
X = S-^[SA\i]] 
lii< X < A 

Branch over all values of 0 < X < A s.r. X > i or 

S[X] ^ running the remaining part of this 

algorithm for all such values. 

K[i\ = X - j - S'[t] 

J=X 

Swap(S'[z], S[j]) 

Verify that [^[0], . . . , - 1]] = [^^[0], . . . , - 1]] 



The number of times this algorithm will perform an iteration is bounded by 
where A if the number of values 0 < x < A where 5”^ [a;] < A. Because A 
is typically quite small, this algorithm is typically efficient. 

An algorithm with a better run time lower bound could be given by using 
the values of S'a[aI], . . . , Sa[N — 1]. 
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Abstract. SSC2 is a stream cipher that operates by XORing the output 
of two “half-ciphers”. The first half-cipher is constructed from a linear 
feedback shift register (LFSR) with a non-linear filter. The second half- 
cipher is constructed from a lagged Fibonacci generator (LFG) and a 
multiplexor that chooses values from the Fibonacci register. The second 
half-cipher has a small cycle length tt « 2®^. The initial state of the 
LFSR is derived by performing a fast correlation attack on the sequence 
resulting when XORing the key-stream at an interval of tt words (thus 
cancelling the effect of the LFG). This attack requires around 2^® words 
of this sequence and a few hours of computation. The initial state of the 
LFG is then derived from around 15300 outputs using around one second 
of computation. 

Keywords: SSC2, fast correlation attack. 



1 Introduction 

SSG2 is a stream cipher proposed by Zhang, Garroll and Ghan [2]. The cipher 
is designed for software implementation and is very fast. This paper describes 
a practical cryptanalysis of SSG2 that requires around 2^® words of known key- 
stream (from a run of 2®^ words) and a few hours work on a 250 MHz processor 
with 100 MB of memory. 

SSG2 is based on a linear feedback shift register (LFSR) and a lagged Fibonacci 
generator (LFG) . An LFSR consists of a register that stores a set of bits called 
the state, and a function that is linear modulo 2. This function updates the state 
bit-by-bit. An LFG consists of a register which stores a set of integers modulo 
N (once again called the state) and a function that is linear modulo N. This 
function updates the state integer-by-integer. In SSG2, the modulus is fV = 2^^, 
and the integers are stored as 32-bit blocks called words. 

SSG2 achieves its speed by using 32-bit operations. The stream is derived 
from a 127-bit LFSR, a 17-word LFG and a multiplexor that chooses values 
from the register of the LFG. The 127-bit register for the LFSR is stored in four 
32-bit words (the extra bit is forced to 1 in the filter function). After the states 
of the LFSR and LFG are initialised, the following steps are repeated to produce 
each word of output: 
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1. 32-bits of the LFSR state are updated simultaneously. A non-linear filter 
(NLF) computes a 32-bit output Ni from the four words in the state of the 
LFSR. 

2. The LFG state is updated. The upper 16-bits and lower 16-bits of the LFG 
output are swapped to form Li. 

3. The multiplexor uses the four most significant bits (MSBs) of the updated 
word to choose one of 16 values in the LFG state to be the output Mi. 

4. The output of the cipher is Zi = {Li Mi mod 2^^) 0 Ni, where 0 denotes 
XOR. 

The value Ni is called the output of the LFSR half-cipher, while Vi = {Li 0 
Mi mod 2^^) is called the output of the LFG half-cipher. 

Previous Results. In the rump session of Grypto 2000, Rose and Hawkes [6] 
reported on correlations between the least significant bits (LSBs) of certain words 
output from SSG2. They also noted that the LFG has a small period tt = 17-2^^ • 
(2^"^ — 1) « 2®^. Gomputing Z{ Zi 0 = Ni® iVi+,r, allows the LFSR to 

be attacked in isolation. The correlation in the LSBs of Z{ allows an attacker to 
distinguish the output of SSG2 from a random bit stream. Another analysis by 
Hawkes and Rose [7] found an attack on the LFSR half-cipher in isolation that 
requires 382 words and around 2^^ time. Bleichenbacher and Meier [1] found an 
attack on the entire cipher that finds the initial state of the LFSR using around 
2®^ words of known key-stream with around time. This attack exploits the 
small period tt. Following this, the initial state of the LFG is found using around 
2^^ known outputs of the LFG half-cipher with around 2^® time. 

Concurrent Results. Independently, Fluhrer, Growley and Harvey had also 
identified a number of correlations in the LFSR half-cipher [4] , and give other at- 
tacks. They noticed that there are actually two different correlations, apparently 
equally valid, with the LSB of the Ni. 

New Results. The first part of the attack in this paper exploits the small 
period of the LFG by performing a fast correlation attack on the stream Z{, 
based on the correlation noted in [ 6 ]. This part of the attack requires around 
2 ^® words of known key-stream (from a run of 2 ®^ words) with a few hours of 
processing time on a 250 MHz Sun UltraSPARG (see Section 3). The attack 
applies simple techniques that increase the accuracy and speed of any fast cor- 
relation attack. After the output of the LFSR half-cipher is removed, the attack 
exploits properties of the LFG noted in [1] to identify when the multiplexor has 
selected specific words in the LFG register. This information is used to recon- 
struct the initial state of the LFG (Section 4). This part of the attack requires 
around 15300 known outputs of the LFG half-cipher (presumed already known 
from the previous phase) and around a second of processing on a 250 MHz Sun 
UltraSPARG. 

2 A Description of SSC2 

LFSR half-cipher. The LFSR state is stored as four 32-bit words denoted 
{Xi+ 3 ,X^+ 2 ,Xi+i,Xi). The state is updated to (Ai+ 4 , A^+s, A^+ 2 , A^+i) by 
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computing 

^i+4 = ^i+2 © (^i+1 << 31) © (Xi >> 1), 

where ‘<<’ denotes a zero-fill left shift and ‘>>’ denotes a zero-fill right shift. 
The least significant bit of is ignored. If this sequence were converted to a 
bit-stream ht, then the bit-sequence would satisfy the linear recursion: 

bt+127 = ^t-i-63 + bt mod 2. 

The corresponding characteristic polynomial is + + 1. This polynomial is 

irreducible modulo 2, which means that the bit sequence has a period of (2^^^ — 
1). The LFSR is implemented using a 4-word array S')!], . . . , 5'[4] containing 
Ali+ 3 , . . . ,Xi. At each clock, the LFSR computes A = S[2] 0 (S[3j << 31) © 
(S[4j >> 1). The values are shifted up (S[4j ^ S[3], S[3] ^ S[2], S[2] ^ S[lj) 
and the value of S[l] is set to A. After the LFSR is updated, the NLF output 
Ni is computed. The NLF uses a variety of operations: XOR; modular addition; 
SWAP{A)-. swaps the upper 16-bits and lower 16-bits of A; and Xi: which denotes 
the word Xi with the LSB forced to 1. 

NLF Algorithm 

1 A ^ Ai +3 0 Xi mod 2^^, with cl ^ carry; 

2 A ^ SWAP(A); 

3 if (cl = 0) then A ^ Xij ^2 + A mod 2^^ with c2 ^ carry; 

4 else A ^ (Alj _|_2 © Xi) + A mod 2^^ with c2 ^ carry; 

5 Ni < — (Aj_|_i 0 Aj_|_ 2 ) 0 A 0 c2 mod 2^^; 

The LFG half-cipher. The LFG state consists of 17 words (Fi+ie, . . . , F)). The 
state is updated to (F^+iy, . . . , F)+i) using the recurrence: 

F,+i7 = F*+i 2 0F, mod232. (1) 

The LFG is implemented using a 17-word array G[l], . . . , G[17j. The key schedul- 
ing initialises G[l] , . . . , G[17] to the values Yie, . . . , Fq, and initialises two pointers 
r and s to 17 and 5 respectively. The output Li is defined as Li = SWAP{Yi). 
The LFG state is updated by computing 

G[r] 0 G[s] = Fi 0 Fj_|_i 2 = F^+iy mod 2^^, 

and replacing the value of G[r] (which was F^) with the value of F^+iy. The 
values of r and s are then decreased by 1 (when r or s reaches 0, the value is 
reset to 17). The output Mi is defined as 

Mi = G[1 0 (s 0 (Fi +17 » 28) mod 16)]. 

As a result of the reduction modulo 16, the formula for Mi in terms of the 
sequence {Yi} changes according to the value of i mod 17. Now that Li, Mi 
and Ni have been computed, SSG2 outputs Zi = {{Li 0 Mi mod 2^^) 0 Ni), 
increments i and repeats the process. This paper does not address the issue of 
obtaining the key from the initial states of the LFSR and LFG, so we do not 
describe the key scheduling algorithm. 
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3 Attacking the LFSR Half-Cipher 

The attack on the LFSR half-cipher is an advanced fast correlation attack, ex- 
ploiting an observed correlation between the least significant bit of the filtered 
output words and five of the LFSR state bits. The attack is aided greatly by the 
fact that the feedback polynomial of the LFSR is only a trinomial: -1-1. 

Meier and Staffelbach observed in [10] in 1989 “any correlation to an LFSR with 
less than 10 taps should be avoided”. 

3.1 Background: Fast Correlation Attacks 

The seminal work on Fast Correlation Attacks is [10], and another paper which 
explains them and explores some heuristic optimisations is [5]. 

Many stream ciphers have an underlying Linear Feedback Shift Register, 
and produce output by applying some nonlinear function to the state of the 
register; many schemes which appear different in structure are equivalent to this 
formulation. SSC2’s LFSR half-cipher is such a construction. 

If the nonlinear function is perfect, there should be no (useful) correlation 
between the output of the generator and any linear function of the state bits. 
Conversely, if there is a correlation between output bits and any linear com- 
bination of the state bits, this may be used by a fast correlation attack to 
recover the initial state. Consider the output bits of the generator, {Bi}, to 
be outputs from an LFSR, {Ai}, modified by erroneous bits {Ei} with some 
probability P < 0.5. The probability of error P is the opposite of the known 
correlation. Put simply, the technique of a Fast Correlation Attack utilises the 
recurrence relations obeyed by the Xi to identify particular bits in the output 
stream which have a high probability of being erroneous, and correct (flip) them. 
To do this, the attack computes {Bj + 2), for each recurrence re- 

lation Aj + Xi6T Ai = 0 (mod 2), (these are also called parity check equations). 
The error probability for bit j: P{Bj ^ Aj), is computed based on the number 
of recurrence relations {Bj + XieT ^ mod 2) satisfied and the number of 
recurrence relations unsatisfied. If there are enough bits in the output stream 
for the given P, this process will eventually converge until a consistent LFSR 
output stream remains. Linear algebra is then used to recover the corresponding 
initial state of the LFSR. 

3.2 Fast Correlation Attack on SSC2 

Recall that tt = 17 • 2^^ • (2^^ — 1) is the period of the Lagged Fibonacci Generator 
half-cipher. If two segments of output stream tt apart are exclusive-ored together, 
the contributions from the LFG half-cipher cancel out, leaving the exclusive-or 
of two filtered LFSR streams to be analysed. 

Let Z' = Zj0X_|_,r = Ni(BNi^TT- A exhibits a correlation to a linear function 
of the bits of the four-word state Si. Define 1{S) = 5'[l]i5 0 S'[l]i6 © «5'[2]3i 0 
S'[3]o 0 <S'[4 ]i 6, where the subscript indicates a particular bit of the word (with 
bit 0 being the least significant bit). Then P(LSB(Zi) = l{Si)) = 5/8. (Note that 
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this correlation is incorrectly presented in [1]). Intuitively, three of these terms 
are the bits that are XORed to form the least significant bits of Nf, the other 
two terms contribute to the carry bits that influence how this result might be 
inverted or affected by carry propagation. Obviously A^i+,r is similarly correlated 
to the state Si+T^, but because the state update function is entirely linear, the 
bits of Si+TT are in turn linear functions of the bits of Si. So LSB(Z') exhibits a 
correlation to L{Si) = l{Si) (B l{Si+Tr). 

Fluhrer [4] shows that there is actually a second linear function l'{S) = 
*S'[4]i 5 © S'[l]i6 © 5'[2]3 i © S'[3]o © 5'[4]ig with the same correlation. We find it 
interesting that in all the test data sets we have used, admittedly a limited 
number, our program always “homes in” on l(S) and not l'{S). The existence 
of this second correlation makes it harder for the program to converge to the 
correct correlation and explains why more input data is required than would be 
inferred from previous results such as [5] . We are continuing to explore this area. 

The words of the LFSR state are updated according to a bitwise feedback 
polynomial, but since the wordsize (32 bits) is a power of two, entire words of 
state also obey the recurrence relation, being related by the 32nd power of the 
feedback polynomial. 

If the two streams Zi and were independent, then the correlation proba- 
bility would be P(LSB(Z') = L{Si)) = Y1 1^2. However these streams are clearly 
not independent and, experimentally, we have determined that there is a “second 
order” effect and in practice the error probability is approximately 0.446, rather 
than the expected 0.46875. This fortuitous occurrence makes the fast correlation 
attack more efficient, and counters to some extent the confusion caused by the 
existence of two correlation functions. 

The attack on the LFSR half-cipher proceeds by first gathering approxi- 
mately 32,000,000 words Z', of which only the least significant bits are utilised 
in the attack. This requires two segments of a single output stream, separated 
by 7T. We then perform fast correlation calculations, to attempt to “correct” the 
output stream, on different amounts of input varying between 29,000,000 bits 
and 32,000,000 bits. Empirically, about 2/3rds of these trials will terminate and 
produce the correct output L{Si); some of the trials might give an incorrect 
answer, while others will “bog down”, performing a large number of iterations 
without correcting a significant number of the remaining errors. The sections 
below describe the fast correlation attack itself in some detail. If the attack is 
thought to have corrected the output, linear algebra is used to relate this back 
to the initial state Sq. The sequence Z' = Zj © can be reconstructed from 
the initial state to verify that So is correct. If S'o is incorrect or the attack “bogs 
down”, then a different number of input bits will be tried. Thanks to the numer- 
ous optimisations discussed below, a single fast-correlation computation when 
successful takes about an hour on a 250MHz Sun UltraSPARC (not a particu- 
larly fast machine by today’s standards) and uses about TOMB of memory. When 
a computation “bogs down” it is arbitrarily terminated after 1000 rounds, and 
this takes a few hours. For a particular output set, the full initial state is often 
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recovered in as little as one hour, and it is very unlikely that the correct state 
will not be found within a day. 



3.3 Increasing the Accuracy of Fast Correlation Attacks 

The discussion below applies mostly to LFSRs with low weight feedback, in 
particular where a trinomial feedback is in use. 

A number of papers have been written since [9] applying heuristic techniques 
to speeding up or increasing the accuracy of the basic technique of fast correlation 
attacks. These include [3,5,8,11]. We first spent a lot of time examining some 
of these techniques, and variation in their basic parameters, to gain an intuitive 
understanding of what is useful and what is not. 

The original technique of [9] distinguished between “rounds” and “itera- 
tions”, where a round started with each of the bits having the same a priori 
error probability. A new probability was calculated for each bit based on the 
probabilities of the other bits involved in parity check equations. Subsequent 
iterations performed the same calculations based on the updated probabilities, 
until enough bits had error probabilities exceeding some threshhold, or a pre- 
determined number of iterations had been exceeded. We found the arguments 
in favour of performing iterations unsatisfying, since it seemed that the new 
probabilities were just self-reinforcing. Eventually, we made structural changes 
to our program which made it impossible to do iterations, and found an overall 
increase in accuracy. 

The basic correlation algorithm has the error probability P as an input pa- 
rameter; P is kept constant throughout the computation, and the bit probabil- 
ities are reset to P at the beginning of each round. In reality, the error prob- 
abilities decrease with each round (at least initially), so this approach results 
in inaccurate estimates for the bit probabilities. We found that as the real er- 
ror probability approaches 0.5, then a constant value of P is unlikely to result 
in a successful attack. The computation is more likely to be successful if P is 
estimated at each round. For a given P, it is straightforward to calculate the 
proportion of parity check equations expected to be satisfied by the data. This 
process is easily reversible, too; having observed the proportion a of parity check 
equations satisfied, it is easy to calculate the error probability Pd 

6 = l-2a, P = 

Since each round begins by counting parity check equations, it is a simple mat- 
ter to calculate P for that round. This technique essentially forbids the use of 
iterations, and obviates techniques like “fast reset”, but nevertheless speeds up 
the attack and increases the likelihood of success. 

We felt that having the greatest possible number of parity check equations 
for each bit was important to the operation of the algorithm, so we performed a 
one-time brute force calculation to look for low-weight multiples of the feedback 

^ This formula is based on the check equations being trinomials. 




A Practical Cryptanalysis of SSC2 



31 



polynomial other than the obvious ones (the powers of the basic polynomial). 
We found a number of them. As well as + 1, the attack uses 

a;16129 + x4033 + 1 , ^12160 + ^4159 + 1 ^ ^12224 + ^8255 + 1 ^ 

2,16383 + 2-12288 + i x^4384 + 2,12351 + i 

and all possible powers of these polynomials. For each bit, the parity checks with 
that bit at the left, in the middle, and at the right, were all used. For 30,000,000 
input bits, an average of 200 parity check equations applied to each bit. 

Lastly, we made the observation that relatively early in the computation, a 
significant number of bits satisfied all of the available parity check equations. 
We called these fully satisfied bits. Experimentally we determined that when 
more than a few hundred such bits were available, and if the computation was 
eventually successful, they were almost all correct, so that any subset of 127 of 
them had a high probability of forming a linearly independent set of equations 
in the original state bits, which could then be solved in a straightforward man- 
ner. Computationally, taking this early opportunity to calculate the answer is 
a significant performance improvement. In a typical run with 30,000,000 bits of 
input, 5,040 fully satisfied bits were available after 16 rounds, all of which turned 
out to be correct, while the full computation required 64 rounds. This is not as 
great an optimisation as it sounds, because the rounds get faster as the number 
of bits corrected decreases (see below) . 

3.4 Increasing the Speed of Fast Correlation Attacks 

At the same time as we were analysing the theoretical basis for improvements 
in the algorithm, we also looked at purely computational optimisations to the 
algorithm. When the probability of error of individual bits is variable, probability 
computations are complex and require significant effort for each bit, as well as 
the requirement to store floating-point numbers for each bit. When the error 
probability P is assumed the same for all bits at the beginning of a round, 
the computation is significantly eased. More importantly, the likelihood that 
a particular bit is in error can be expressed as a threshhold of the number 
of unsatisfied parity check equations, given the total number of parity check 
equations for that bit, and the probability P. 

The number of parity check equations available for a particular bit is least 
near the edges of the data set, and increases toward the middle. During the 
first pass over the data, the number of equations available for each bit is simply 
counted (this is computationally irrelevant compared to actually checking the 
equations) and the indexes where this total is different to that for the previous bit 
is stored. Thus, it requires very little memory to derive the total number of parity 
checks for a particular bit in subsequent passes. In each round, the first pass over 
the data calculates (and stores) the number of unsatisfied checks for each bit. 
From the total proportion of parity checks unsatisfied, P is calculated for this 
round, and from that, threshhold values above which a bit will be considered to 
be in error are calculated for each number of parity check equations. When P < 
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0.4 it is approximately correct that more than half of the parity checks unsatisfied 
implies that the probability of the bit being erroneous is greater than 0.5, and 
the bit should be corrected. However, when P > 0.4, more equations need to be 
unsatisfied before flipping a bit is theoretically justified. The algorithm’s eventual 
success is known to be very dependent on these early decisions. 

A pass is then made through the data, flipping the bits that require it. For 
each bit that is flipped, the count of unsatisfied parity checks is corrected, not 
only for that bit, but for each bit involved in a parity check equation with it. 
The correction factor is accumulated in a separate array so that the correction 
is applied to all bits atomically. Bits which have no unsatisfied parity checks are 
noted. In the early rounds, this incremental approach doesn’t save very much, 
but as fewer bits are corrected per round the saving in computation becomes 
very significant. 

Typically another 50% of the overall computation is then saved when the 
count of fully satisfied bits significantly exceeds the length of the register, and 
the answer is derived from linear algebra. The net effect of the changes described 
in this and the previous section is a factor of some hundreds in the time required 
for data sets of about 100,000 bits over a straightforward implementation. We 
did not have time to find the speedup for larger data sets, as it would have 
required too long to run the original algorithm. 



4 Attacking the LFG Half-Cipher 

This attack derives the initial state IV = (Fie, • ■ • , Fq) of the LFG from outputs 
of the LFG-half cipher: Vi = Li + Mi mod 2^^ = Zj 0 W • Much of the analysis 
is based on dividing the 32-bit words into two 16-bit blocks: A = A"||A'. Note 
that 



M/+17 - F^Vi 2 - = 0 mod 216, 

T/lir - F^/;i 2 - y” = h mod 2i®, 

where /j € {0, 1}, denotes the carry bit to the upper half in the sum (Mi0Fj+i2)- 
The value jii = (Fj+17 >> 28) chooses Mi from the set {Fi+i, . . . , F^j+17}: 
Mi = G[1 + {s + fii mod 16)]. The value a* such that Mi = Fi+q,, is the 
multiplexor difference. We always write pii in hexadecimal form, and ai in decimal 
form. The particular word chosen depends on fii and s, where s is directly related 
to value of i = i mod 17. For example, if ^i = 0, then Ui = 12 unless i G {4, 5}, 
in which case Ui = 11. 



4.1 Motivation 

The attack exploits a property of outputs (Vi,Vi+i 2 ) with ai = 0^+12 = 12. 
These are called good pairs; all other pairs {Vi, Fi+12) are bad pairs. The initial 
state can be derived from good pairs using the following observations. 
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1. If {Vi, Vi+ 12 ) is good, then Mi = Yi+ 12 , and Mi +12 = Yi+ 24 , so 

k+12 - V'' = (r/+24 + - {y:u2 +y'+ g^) 

= Y'+ 2 i - Y' - mod 2^®, 

where gi G {0,1} is the carry bit from the lower half to the upper half in 
the sum Vi = Li + Mi. Note that if an attacker is given a good pair, then 
(^+24 — Y() can be derived from ~ Yl') by guessing gi. 

2. Every 16-bit half-word F/ is a linear function (mod 2^®) of the half-word 
initial state IV = (Y{q, . . . , Yq). Thus (F/^_24 — Y.) is also a linear function 
(mod 2^®) of IV'. We say that the values (Fj+24 — F/) are linearly independent 
(LI) if the linear equations for (l}+24 ~ Yl) are linearly independent. If the 
attacker knows a set of 17 LI values {YI^^a ~ Yl) then the values of IV can 
be determined by solving the system of linear equations. 

3. Now, having obtained IV , all values Y( in the sequence (F/j can be com- 
puted. For each of the 17 good pairs, the value of F/^_^2 allows 

F" = F/ - F/+42 mod 2^® 

to be computed. Computing F/ completes the word Yi = F"||l}'. The 17 
equations for Yi (in terms of the complete initial state IV) will also be LI, so 
this system can be solved to find the initial state, and the attack is complete. 

There remain two problems: guessing the 17 carry bits gi and identifying 
good pairs. 



4.2 Guessing the Carry Bits 

The attack will have to try various combinations of values for gi before the 
correct carry bits are found. The attack avoids trying all 2^^ combinations by 
computing an accurate prediction pi for the value of gi. Note that if V}' < 2^® 
then the carry from the sum (F/' -|- F/_^_42 mod 2^®) is more likely to be one than 
zero. That is, we can predict that gi = 1. Conversely, if V( > 2^® then the carry 
gi is more likely to be zero than one. Based on this, the attack either sets Pi = 1 
when V[ < 2^® or sets Pi = 0 when V}' > 2^®. Hence, rather that guessing the 
carry bits gi, the attack guesses the 17 errors = Pi(Bgi. The attack first guesses 
that there are no errors (all Ci = 0), then one error (one value of Ci = 1), two 
errors, and so forth. The accuracy of the prediction, P{pi = gi), depends on V(. 
Experimental results are shown in Table 1. 



Table 1. Experimental approximation to the accuracy of the prediction, P{pi = gi), 
as a function of the four MSBs of Vt . 



The 4 MSBs of F/ 


0,1 


2,3 


4,5 


6,7 


8,9 


A,B 


C,D 


E,F 


P{Pi = 9i) 


0.96 


0.83 


0.7 


0.56 


0.56 


0.69 


0.8 


0.93 
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If all the pairs are good then there will be only a small number of errors. 
When choosing the 17 LI values (h ^/+24 — Yl), the attack gives preference to 
values with accurate predictions as there are fewer errors, and the attack will 
be faster. As shown below, the attack has a small probability of choosing one 
or more bad pairs. If the correct initial state is not found while the number of 
errors is small, then this suggests that one of the pairs is bad, so our attack 
chooses another set of 17 LI values (L ^+24 ~ ^/)- 

4.3 Identifying Good Pairs 

There are 16 possible values for Oj, so we expect good pairs to occur every 16^ = 
256 words (on average). The problem is identifying good pairs. The trick is to 
identify triples (hi, k^i+ 17 ) with Oj = 0^+12 = cti+n = 12. Bleichenbacher 
and Meier [1] noted that if Oj = 0^+12 = cti+n, then 

A Vi+n - 14+12 - 14 mod 2^2 e {0, 1, -2^6, 1 - 2i6} = A. 

A triple of outputs (14,14+12,14+17) that results in S A is said to be valid, 
because ai = 0 ^+ 12 (= Qfi+ 17 ) with probability close to one, (which fulfills part 
of the requirement for a good pair) . 

Note that Hi+n = Mi +12 + Mi + c (mod 16), where c G {0,1}, due to the 
recurrence (1). Hence, the possible combinations for (/x+ /ii+ 12 , Mi+ 17 ) that result 
in ai = ai +12 = o;i+i 7 are those given in Table 2 (these are also noted in [1]). 



Table 2. The possible combinations for (/!+ /ii+12, /ii+17) that result in at = ai+12 = 

ai+17 



(Mi,Mi+12,Mi+17) 


(0,0,0) 


(0,F,0) 


(1,0,1) 


(F,0,F) 


(F,F,F) 


Values of i 


{4,5,9,10} 


i = 9 


i e {9,10} 


4 


i^{4,9} 


ai 


12 


12 


11 


12 


13 



A valid triple that corresponds to a good pair is also said to be good; oth- 
erwise the triple is said to be bad. If i = 4, then all valid triples are good, and 
they are used in the attack. If i S (5, 10}, then all valid triples are bad so these 
triples are ignored. We currently do not have an efficient method of distinguish- 
ing between the cases when (Mi, Mi+i2, Mi+17) = (0,F,0) and (Mi, Mi+12, Mi+17) = 
(1, 0, 1), so the attack also ignores triples with i = 9. 

If i ^ (4,5,9, 10}, then a valid triple is equally likely to be either good or 
bad: good when (^j, /4j+i2, M*+i7) = (0,0,0), and bad when (/r*, /Xi+i2, Mi+17) = 
(F,F,F). Most of the bad triples are filtered out by examining the values of V}' 
and 

5, = 17 - V'i,, - V'' mod 2 ^ 6 , 

=' ((^+12 - V: mod 2 ^ 6 ) » 12 ) 

= the 4 MSBs of (14" 12 ~ K mod 2^^). 
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Table 3. The probabilities of certain properties being satisfied in the two cases where 
e {(0,0,0), (F,FF)} 



{Pi,Pi+12,Pi+l7) 


P{vi e (0, 1 , F}) 


P{5i G {0,-1}) 


P{V > 2^® : (5i = 0) 


(0,0,0) 


1 


0.99 


0.85 


(F,F,F) 


3 

16 


0.51 


0.15 



The attack discards valid triples with i ^ (4,5,9, 10}, if 

- Vi ^ (0, 1,F|, or 

- Si ^ (0, 1}, or 
-V'i< and 5* = 0. 

Following this, 0.99 x 0.85 = 0.84 of the good triples remain, while only ^ x 

0.51 X 0.15 = 0.024 of the bad triples remain. Thus, 0.024/0.84 = 0.028 (one in 
36) of the remaining valid triples are bad. The bound on y/ (when bi = 0) can be 
increased to further reduce the fraction of bad triples to good triples. However, 
this will also reduce the number of good triples that remain so the attack would 
require more key-stream. 

LFG Half-Cipher Attack Algorithm 

1. Find a set of triples with i ^ (5, 9, 10} and Ai & A (valid triples). For i yf 4, 
discard triples if 

~ Vi ^ (0, 1, F}, 

~ bi ^ (0, 1}, or if 
- (5* = 0 and V( < 2^^. 

For each remaining triple, set Pi = I ii V( < 2^®; else set Pi = 0. 

2. From these triples, find 17 LI values (T /+24 ~ Y(), for which P{pi = gi) is 
high. 

3. Guess the errors Ci. If the number of errors gets large, then return to Step 2. 

4. Compute IV from (T /+24 ~ Y() = V}/_i 2 — k}" — (pi 0 Ci) mod 2^®. 

5. Compute F" = F/ — F /+12 2^®, to obtain y. 

6. Compute the entire state IV from y. Return to Step 3 if IV produces the 
incorrect output. 



4.4 Complexity 

The number of outputs required for the attack is affected by three factors. 

1. The probability that a triple is valid. Recall that pi+n = Pi +12 + Pi + 
c (mod 16). To obtain (/r^, /ii+ 12 , Pi+ 17 ) = (0, 0, 0), it is sufficient to have pi = 
0, Pi+i 2 = 0 and c = 0, so the combination (0,0,0) occurs with probability 
2“®. Similarly, {p^, pi+ 12 , p^+l 7 ) = (F,0,F) and {pi, pi+ 12 , pi+ 17 ) = (F,F,F) 
occur with probability 2“® each. 





36 



Philip Hawkes, Frank Quick, and Gregory G. Rose 



2. The probability that a valid triple is good. Of the good triples with 
i ^ {4,5,9,10}, only 0.84 proceed to Step 2, while all of the good triples 
with i = 4 proceed to Step 2. So the probability of a good triple getting to 
Step 2 is 2-9 X (if X 0.84 + ^ x 1). 

3. Finding 17 LI valnes of (yj +24 — F/) from the good triples. Assuming 

that 17 good triples get to Step 2 there is no guarantee that the values 
(Fi+24 ~ LI- However, we found that a set of 21 values of {YIj^ 2A ~ 

is typically sufficient to find 17 that are LI. 

Therefore, the average number of outputs required for the attack on the LFG 
half-cipher is around 



-1 

= 15300. 

There is a large variation in the time/process complexity, as the attacker will 
have to return to Step 2 if a bad triple has been selected. Our implementation 
of the attack on a 250MHz Sun UltraSPARC typically takes between 0.1 and 10 
seconds. 

5 Conclusion 

We have demonstrated that attacks on SSC2 are computationally feasible, given 
a sufficient amount of key-stream. The attack requires portions from a (currently) 
prohibitive amount of continuous key-stream (around 2 ®^ continuous outputs). 
However, we suggest that the existence of this attack indicates that SSC2 is not 
sufficiently secure for modern encryption requirements. 





X 0.84 -h ^ X 1 


X 2-9 


A17 


17 


) 
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Abstract. The encryption system Eq, which is the encryption system 
used in the Bluetooth specification, is examined. In the current paper, 
a method of deriving the cipher key from a set of known keystream bits 
is given. The running time for this method depends on the amount of 
known keystream available, varying from 0(2®"^) if 132 bits are available 
to 0(2^®), given 2^® bits of known keystream. 

Although the attacks are of no advantage if Eq is used with the rec- 
ommended security parameters (64 bit encryption key), they provide an 
upper bound on the amount of security that would be made available by 
enlarging the encryption key, as discussed in the Bluetooth specification. 



1 Introduction 

We give algorithms for deriving the initial state of the keystream generator 
used within Eq given some bits of keystream with less effort than exhaustive 
search. From this, we derive a method for reconstructing the session encryption 
key used by Eq based on some amount of keystream output. Eq uses a two level 
rekeying mechanism, using the key to initialialize the level 1 keystream generator 
to produce the initial state for the level 2 keystream generator, which produces 
the actual keystream used to encrypt the data. 

We use a known keystream to reconstruct the initial state for the level 2 
keystream generator, which we then use to reconstruct the initial state for the 
level 1 keystream generator, from which we can directly deduce the encryption 
key. Reconstructing the state of the level 2 keystream generator takes an ex- 
pected 0(2^®) to 0(2®"*) work effort (based on the amount of known keystream 
available). Another attack with even more keystream available takes 0(2’^^) 
work. 

By reconstructing the state from either 1 or 2 packets that are encrypted 
during the same session, we can reconstruct the state of the level 1 keystream 
generator in an expected 0(2®^) or 0(2®^) time, which gives a total of 0(2’^®) 
to 0(2®"*) work effort. 

This paper is structured as follows. In Section 2, the Eq keystream genera- 
tor, and how it is used within the Bluetooth system is described. In Section 3, 
previous analysis and results are summarized. Section 4 presents our base attack 
against the keystream generator. Section 5 describes how to use it against the 
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level 2 generator. Section 6 deals with another approach to attack the keystream 
generator and the second level, if a huge amount of known keystream is available. 
Section 7 describes the basic attack on the first level of Eq , given one state of 
the level 2 generator, while Section 2 deals with an attack given two such states. 
Section 9 comments on attacking the full Eg system. Section 10 concludes and 
discusses the ramifications on the Bluetooth system. 



2 Description of Eq 

Eq is an encryption protocol that was designed to provide privacy within the 
Bluetooth wireless LAN specification. When two Bluetooth devices need to com- 
municate securely, they first undergo a key exchange protocol that completes 
with each unit agreeing on a shared secret, which is used to generate the en- 
cryption key {Kc)- To encrypt a packet, this private key {Kq) is combined with 
a publicly known salt value (EN^RAND) to form an intermediate key 
Then, Kq is used in a linear manner, along with the publicly known values, the 
Bluetooth address, and a clock which is distinct for each packet, to form the 
initial state for a two level keystream generator. 

The keystream generator consists of 4 LFSRs with a total length of 128 bits, 
and a 4 bit finite state machine, refered to as the blender FSM. For each bit 
of output, each LFSR is clocked once, and their output bits are exclusive-or’ed 
together with one bit of output from the finite state machine. Then, the 4 LFSR 
outputs are summed together. The two most significant bits of this 3-bit sum 
are used to update the state of the finite state machine. We will refer to the 25 
bit LFSR as LFSRl, the 31 bit LFSR as LFSR2, the 33 bit LFSR as LFSR3 and 
the 39 bit LFSR as LFSR4. We will also refer to the finite state machine as the 
blender FSM. The generator is shown in Figure 1. Note that the least significant 
bit (LSB) of the sum of the four LFSRs is their bit-wise XOR. 

There are logically two such keystream generators. The key of the first level 
keystream generator is shifted into the LFSRs, while clearing the blender FSM. 
Then, 200 bits are generated and discarded. Then, the output of this keystream 




Fig. 1. The Eq keystream generator. 



^ The attacks in the current paper actually provide the value of 






40 



Scott Fluhrer and Stefan Lucks 



generator is collected, and is used to initialize the LSFRs of what we call the 
second level keystream generator, which is structurally identical to the first level 
keystream generator. This initialization is done by collecting 128 output bits, 
parallel loading them into the LSFRs, and making the initial second level FSM 
state be the final first level FSM state. 

This output of this second generator is then used as an additive stream cipher 
to encrypt the packet. 

3 Description of Previous Work 

In a sci. crypt. research posting [6], Markku-Juhani O. Saarinen showed an attack 
that rederived the session key. This attack consisted of guessing the states of the 

3 smaller LFSRs and the blender FSM, and using those states and the observed 
keystream to compute whether there is a consistent output from LFSR4 that is 
consistent with that assumption. 

In the original posting, he estimated the attack to have overall complexity 
of 0(2^°°). However, he assumed that only 125 bits of keystream were available, 
and so he assumed a significant amount of time would be spent checking false 
hits. Since significantly more keystream is available within a packet, the true 
complexity is closer to 0(2®^) expected. 

Our attacks can be viewed as refinements of Saarinen’s attack by taking 
the same basic approach of guessing the initial states of part of the cipher, 
and checking for consistency. However, our attacks take advantage of additional 
relationships within Eq and use them to gain some performance. 

Ekdahl and Johansson have shown in [2] how to extract the initial state 
from the keystream generator used in Eq given 0(2®^) time and 0(2®®) known 
keystream. Their attack works by exploiting some weak linear correlations be- 
tween the outputs of the LFSRs and the keystream output to verify if a guess on 
one of the LFSRs is accurate. Previous to that, Hermelin and Nyberg published 
in [4] an attack which recovered the initial state with 0(2®^) work and 0(2®"*) 
known keystream. However, these are theoretical attacks as they require a far 
larger amount of consecutive keystream output than is available. 

A time-spaces tradeoff attack has been described by Jakobsson and Wetzel 
[5]. Given N key streams and running time T, it is possible to recover one of 
the N keys if N *T > 2®®®. A similar attack on the A5 keystream generator has 
been previously described by Golic [3]. 

Our attacks resemble a general type of attack, the linear consistency attack, 
which has been described as early as 1989 by Zeng, Yang, and Rao [7]. 

4 Base Attack on the Eq Keystream Generator 

The base attack rederives the initial settings of the LFSRs, given a limited 
(132 or so bits) keystream output. We will later show how this attack can be 
separately optimized for both levels of the keystream generators. For this attack, 
you assume the initial settings of the blender FSM and the contents of LESRl 
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and LFSR2, and maintain for each state the current settings of the blender 
FSM, and a set £ of linear equations on the LFSR3 and LFSRA output bits. 
We will refer to those output bits as LFSRSn and LFSRAn- 

First, you initialize the set £ to empty. Then, you perform the below depth- 
first search: 

1. Call the state we are examining n. Compute the exclusive-or of the output 
n of LFSRl and LFSR2, the next output of the blender FSM (based on 
the current state), and the known keystream bit Z„. If our assumptions are 
correct to this point, this must be equal to the exclusive-or of the outputs 
of LFSR3 and LFSRF 

2. If the exclusive-or is zero, then we branch and consider the cases that both 
LFSR3 and LFSR4 output a zero here, and that they both output a one. 
When we assume a zero, we include in £ the two linear equations LFSRSn = 
0 and LFSRAn = 0, and when we assume a one, we include in £ the two 
linear equations LFSR3n = 1 and LFSR4n = 1. 

3. If the exclusive-or is one, then we include in £ the single linear equation 
LFSR3n yf LFSR4n 

4. If n > 33, then we include in £ the linear equation implied by the LFSR3 
tap equations. If n > 39, then we include in £ the linear equation implied 
by the LFSRA tap equations. In both cases, we check to see if the new 
equations are inconsistent with the equations already in £. If they are, then 
some assumption we made is incorrect and we backtrack to consider the next 
case. 

5. Compute the next state of the blender FSM. This is always possible, as the 
next state depends on the current state (which we know) and the number of 
LFSRs that output a one, which we know. 

6. If n is more than 132, then we have found with high probability the initial 
state of the encryption engine. If not, then we continue this search for state 
n+1 

There are two ideas behind this algorithm. The first is that the next state 
function for the blender FSM depends only on the number of LSFRs that output 
a one. So, when we assume that the outputs of LFSR3 and LFSR4 differ, we 
need not decide which one outputs a zero and which one outputs a one - instead, 
we can just note the fact that they differ and continue the search. 

The other idea is that systems of linear equations in GF{2) can be quite 
efficiently examined for contradictions. 

How efficient is this attack? We provide some heuristic arguments. First, 
consider the case that all the assumed bits of LFSRs 1 and 2 and the blender 
state are correct. 

With every step we learn if the sum S of the two output bits is either (a) 
S G {0, 2} or (b) S' = 1. Both cases (a) and (b) are equally likely. 

Note Prob[S = 1] = 0.5, and Prob[S = 0] = Prob[S = 2] = 0.25. If S = 1, 
we learn one linear equation on the state bits of LFSRs 3 and 4 (namely the 
XOR of the two current output bits). If S G {0, 2}, we branch and consider both 
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S = 0 and S = 2. Both S' = 0 and S = 2 provide us with two linear equations 
on the state bits of LFSRs 3 and 4. 

On the average, we expect to learn 1.5 linear equations and branch 0.5 times 
for each step. Once we have learned in total 33+39=72 equations, we are in a leaf 
of the branch tree and know or “have guessed” all bits in the system. The number 
of such leaves describes the amount of work. (Note that this analysis is based on 
the heuristic assumption that no equations are redundant or contradictory, or 
rather, that the effects of redundant and contradictory equations on the amount 
of work cancel out.) 

So, our branch tree has an “average” size determined by 2^^/^ = 2^"^ leaves. 
We initially assumed 60 bits and can expect to have made a correct assumption 
after trying 2®® times, which gives us a running time of 0(2®®+^^) = 0(2®^) on 
the average. 

Experiments demonstrate that our heuristic arguments on the efficiency of 
the attack are reasonable, though perhaps a bit optimistic. For a random incor- 
rect guess of initial state, the procedure examines an average of approximately 60 
million (2^®) states before terminating. Thus we can reconstruct the encryption 
engine state in 

0(2®®) expected time. 

However, for both the first level and the second level keystream generator, we 
can take advantage of special conditions that allow us to further optimize the 
attack. 

5 Attack on the Second Level Eq Keystream Generator 

To optimize the attack against the second level keystream generator (which 
produces the observed keystream directly), we note that the base attack is more 
efficient if the outputs of LFSR3 and LFSR4 exclusive-or’ed together happens 
to have a high hamming weight. To take advantage of this, we extend the attack 
by assuming that, at a specific point in the keystream, the next n + 1 bits of 
LFSR3 exclusive-or’ed with LFSR4 are n ones followed by a zero, where n will 
be less than the length of the LFSRs. Since LFSR outputs are effectively random 
and independent with such a length (since both LFSRs can generate any n + 1 
bit pattern at any time with approximately equal probability if n < 32), the 
probability a n + k length output contains such a sequence is approximately 
fc-2-” (for 2”). 

If the assumption that the LFSRs produce such an output at the specific point 
in the keystream is false, we will fail to discover the internal state. However, the 
amount of work required to make that determination turns out to be rather less 
than 0(2®®“”), and so if we have 2" or more starting places to test out, we 
will find a place where the above procedure discovers the initial state with high 
probability. 

The expected amount of time the base attack will take when we precondition 
the assumed outputs of LFSR3 and LFSR4 can be experimentally obtained. The 
results are given in Table 1, together with the expected time for the full search. 
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Table 1. The expected complexity and plaintext required for various values of n. Base 
Search Time is the expected number of nodes traversed in a single run of the base 
attack. Expected Plaintext Required is the expected amount of plaintext we need to 
prosecute the attack. Expected Search Time is the expected total search time taken. 



n 


Base Search Time 


Expected Plaintext Required 


Expected Search Time 


5 


224.8 


165 bytes 


283.8 


10 


223.5 


1157 bytes 


282.5 


15 


222.1 


33k 


281.1 


20 


220.5 


IM 


279.5 


25 


218.8 


32M 


277.8 


30 


217.1 


IG 


276.1 



Looking through this table, we can see that modest amounts of keystream reduce 
the expected work somewhat, however, vast quantities of keystream reduce the 
expected work only slightly further. 

Formally, the algorithm is: 

1. Select a position in the known keystream that is the start of more than 132 
consecutive known bits. 

2. Cycle through all possible combinations of 4 bits of blender FSM state, 25 
bits of LFSRl state and the last 30 — n bits of LFSR2 state 

3. Compute the initial n + 1 bits of LFSR2 state that is consistent with the 
exclusive-or of LFSR3 and LFSR4 consisting of n ones and then zero. 

4. Run the base attack on that setting. Stop if it finds a consistant initial 
setting. 

The above algorithm runs the base attack times and has a 2“” prob- 

ability of success for a single location. 

Note that, even though a single packet has a payload with a maximum of 
2745 bits, we can have considerably more than 2745 bits of known keystream, 
if we know the plaintext of multiple packets. All the next phase of the attack 
needs to know is the initial state of the second level keystream generator for a 
packet - it does not matter which. If we have multiple packets, we can try all of 
them, and we will be successful if we manage to find the initial state for any of 
them. 



6 Another Attack on the Second Level Generator 

Given a huge amount of known keystream, there is another technique to attack 
the second level keystream generator more efficiently. The basic attack requires 
to assume the blender state and the states of both LFSRl and LFSR2 (i.e. 
4-1-25-1-31 bits = 60 bits). Now, we start with assuming only the blender and 
LFSRl states (29 bits), at the beginning of the attack. During the course of the 
attack, we continue to make assumptions on how the blender state is updated. 
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Denote the sum of the outputs of LFSR2, LFSR3, and LFSR4 by S. Obvi- 
ously, S e {0, 1, 2, 3}. Since we always know (based on previous assumptions) the 
current blender and LFSRl state, we only need to know S in order to compute 
the next blender state. The current output bit tells if S is odd or not. Thus, we 
know if either (a) S' in {0,2} or (b) S in {1,3}. 

Both cases (a) and (b) are equally likely. And in both cases we learn one 
linear equation, namely we learn the XOR of the output bits of the LFSRs 2-4. 

Now consider the conditional probabilities Prob[S'=2|(a)] and Prob[S=l|(b)]. 
Assuming the three output bits are independent uniformly distributed random 
bits (which they are, approximately), we get 

Prob[S' = 2|(a)] = ProbiS” = 1|(6)] = 0.75. 

Instead of branching, as we did in the base attack, we simply assume the likely 
case S' G {1, 2}, ignoring S = 0 and S = 3. 

We need 31 -I- 33 -I- 39 = 103 linear equations to entirely restore the states of 
the LFSRs 2-4. The assumptions we get here are linearily independent. If both 
our initial assumptions on the 29 state bits of blender and LFSRl and our 103 
assumptions on the sum S are correct, we have found restored the correct state. 
We can check so by computing S output bits (with S > 29) and comparing the 
output stream we get by our assumed Aq state with the true output stream. 

Within these 103 clocks the random variable S takes 103 values Si,S 2 ,. ■ ■ G 
{0,1, 2, 4} with Prob[5i G {1,2} = 0.75]. The attack works if Si G {1,2} and 
S 2 G {1,2} and ...and ^los G {1,2}. Making the heuristic (but apparently 
plausible) argument that the Si behave like 103 independent random variables, 
the probability p = Prob[ G {1, 2} and . . . and S'los G {1,2}] is 

p = 0.75^°^ « 1.35 * 10"^^ « 2-^2-^. 

If the initially assumed 29 bits are correct, the attack requires less than 2“^^ bits 
of known keystream and less than 2^^ steps (each step means to solve a system 
of 103 linear equations). Thus the entire attack needs 

less than 2^^ bits of known keystream 



and 



less than 2^^ steps. 



7 Attack on the First Level Eq Keystream Generator 

To attack the first level keystream generator (which produces the initial LFSR 
and blender FSM states), we first note that the key setup sets the FSM state of 
the second level keystream generator to be the final contents of the FSM state 
after the first level generator has produced the last bit for the LFSR state. We 
also note that the next-state function of the cipher is invertible - the LFSRs 
can be run backwards as easily as forwards, and the FSM next state function is 
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invertible given a current LFSR state. We can also test the base attack, and find 
that it works essentially as well on the backwards cipher as it does the forward 
cipher. 

This suggests this attack: when given one state of the level 2 generator, cycle 
through all possible combinations of 25 bits of LFSRl state and 31 bits of LFSR2 
state, and use the base attack on the reversed cipher, using as the initial FSM 
contents the initial contents of the phase 2 FSM. Because we are cycling through 
an expected 0(2®®) LFSR states, and each check is expected to take 0(2^®) time, 
we should expect to find the first level initial position in 0(2®^) time. 

8 Attack on the First Level Eq Keystream Generator 
Given Two Second Level Keystreams 

Now, let us consider a possible attack if the attacker has the first level output 
for two distinct packets that were sent with the same key. In this case, we first 
note that both keystreams have a clock associated with it, and that the clock is 
the only thing that differs. We further note that the method of combination is 
linear, hence if we know the xor differential in the clock (which we do, because 
we know the actual clock values), we know the xor differential of the first level 
LFSRs. 

We can use this to optimize the attack further, as follows, where we will 
indicate the two known sides with as xa and xb, and where £ is a set of linear 
equations on the outputs of LFSR2a, LFSR5a, LFSRAa- 

Assume the contents of LFSRIa (which also gives you LFSRIb, because of 
the known differential between the two). 

Initialize the set C to empty. 

Perform the following depth-first search 

1. Call the state we are examining n. Compute the output of 

LFSRIa, LFSRIb, the previous output of the blender FSMs based 
on the current state), and the known keystream bit Zg. If our as- 
sumptions are correct to this point, this must be equal to the exclusive- 
or of the outputs of LFSR2a, LFSRIa, LFSRAa and of LFSR2b, 
LFSRIb, LFSR4^b- 

2. Check the known differential in LFSR2a, LFSRSa, LFSRIa, LFSR2b, 
LFSRSb, LFSRAb to see if there is a setting of those bits that satisifies 
both the known xors and the known differentials. If there is not, then 
backtrack to consider the next case. 

3. If we reach here, there are four possible settings of the outputs of LF SR2a, 
LFSRS A, LFSRAa which are consistent with known xors and differen- 
tials. At least two of those settings will also update both blender FSMs 
identically, and will differ in precisely two bits. Here, we branch and 
consider three cases: one case that corresponds to the two settings which 
updates both blender FSMs identically, and the other two cases corre- 
sponding to the other two settings. For the first case, we include in C 
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the linear equation implied by the two bits that differ, and the linear 
equation implied by the third bit setting. For the other two cases, we 
include in C three linear equations giving the three bit settings. 

4. If n > 31, then we include in C the linear equation implied by the 
LFSR2a tap equations. 

5. If n > 33, then we include in C the linear equation implied by the 
LFSRSa tap equations. If n > 39, then we include in C the linear 
equation implied by the LFSR4a tap equations. In all three cases, we 
check to see if the new equations are inconsistent with the equations 
already in C. If they are, then some assumption we made is incorrect 
and we backtrack to consider the next case. 

6. Compute the previous state of the blender FSMs. This is always possible, 
as the next state depends on the current state (which we know) and the 
number of LFSRs that output a one, which we know. 

7. If n is more than 128, then we have found with high probability the initial 
states of the encryption engines. If not, then we continue this search for 
state n + 1 

Experiments show that the above procedure examines an expected 0(2^^) nodes 
during the search. 

9 Attack Against Full Eq 

Below is how we can combine these attacks into an attack on the full Eq encryp- 
tion system. 

Assume we have an amount of known keystream generated with an unknown 
session key, which may be from a single packet or it may be from multiple 
packets. We select n based on the amount of known keystream. We can then 
use the attack shown in Section 5 to find the initial LFSR and blender FSM 
settings for a packet generated by that session key. If the cost of finding the 
initial LFSR and blender FSM settings for a second packet is less than 0(2®^), 
then we find a second one. Then, we either use the attack shown in Section 7 to 
find all possible initial LFSR settings that generated that initial setting (if we 
have one initial LFSR setting) , or we use the attack shown in Section 8 if we have 
two initial LFSR settings. Once we find the initial LFSR settings that generates 
the observed output, we can step the LFSRs back 200 cycles, and use linear 
transformations to eliminate the Bluetooth address and the block to reconstruct 
the session key and verify that potential key by using to to decrypt other 
packets. 

If we denote the amount of effort to find a LFSR and blender setting given 
n bytes of known keystream as F{n) (see table 1), then the total effort for this 
attack is 

0{min{F{n) + 2®^, 2F(n/2) -|- 2®^)) work. 

This is 0(2®^) if you have barely enough keystream to uniquely identify the 
session key (eg., 140 bits), and drops to 0(2'^’^) if you have a gigabit of known 
keystream. 
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Fig. 2. Expected work effort required to recover session key, versus known keystream. 



We can further reduce the effort down to 

0(2^3) work, 

if about 14 000 gigabit bits of keystream are available. We simply use the attack 
from Section 6 twice, to recover two states of the level 2 generator, and then 
continue with the attack from Section 8. 

These results are summarized in Figure 2. 

10 Conclusions and Open Problems 

We described methods for rederiving the session key for Eq given a limited 
amount of known keystream. This session key will allow the attacker to decrypt 
all messages in that session. We showed that the real security level of is 
no more than 73-84 bits (depending the amount of keystream available to the 
attacker), and that larger key lengths suggested by the Bluetooth specification^ 
would not provide additional security. 

^ “For the encryption algorithm, the key size may vary between 1 and 16 octets (8-128 
bits). The size of the encryption key shall be configurable for two reasons. [First is 
export provisions]. The second reason is to facilitate a future upgrade path for the 
security without a costly redesign of the algorithms and the encryption hardware; 
increasing the effective key size is the simplest way to combat increased computing 
power at the opponent side. Currently (1999) it seems that an encryption key size of 
64 bits gives satisfying protection for most applications.” [1, Section 14, page 148] 
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We empicically observed that the technique from Section 6 (assume the 
blender state and LFSRl only, and build up a set of equations based on the 
states of LFSR2, LFSR3 and LFSR4) posed some practical problems, because 
the equations created are rather complex. Also, the technique requires a huge 
amount of known keystream. It would be interesting to develop improved tech- 
niques to handle the set of linear equations more efficiently. Also, it would be 
interesting to reduce the required amount of known keystream. 

Another approach for more practical attacks on Eq and Bluetooth would be 
to exploit the weak mixing of the clock into the first level LFSRs, which will, at 
attacker known times, leave three of the LFSRs with zero differential. 
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Abstract. Cryptographic Boolean functions should have large distance 
to functions with simple algebraic description to avoid cryptanalytic at- 
tacks based on successive approximation of the round function such as 
the interpolation attack. Hyper-bent functions achieve the maximal min- 
imum distance to all the coordinate functions of all bijective monomials. 
However, this class of functions exists only for functions with even num- 
ber of inputs. In this paper we provide some constructions for Boolean 
functions with odd number of inputs that achieve large distance to all 
the coordinate functions of all bijective monomials. 

Key words. Boolean functions, hyper-bent functions, extended Hadamard 
transform, Legendre sequences, nonlinearity. 



1 Introduction 

Several cryptanalytic attacks on block ciphers are based on approximating the 
round function (or S-box) with a simpler one. For example, linear cryptanaly- 
sis [13] is based on approximating the round function with an affine function. 
Another example is the interpolation attack [10] on block ciphers using simple 
algebraic functions as S-boxes and the extended attack in [11] on block ciphers 
with probabilistic nonlinear relation of low degree. 

Thus, cryptographic functions used in the construction of the round func- 
tion should have a large distance to functions with simple algebraic description. 
Along this line of research , Gong and Golomb [9] introduced a new S-box de- 
sign criterion. By showing that many block ciphers can be viewed as a non 
linear feedback shift register with input, Gong and Golomb proposed that S- 
boxes should not be approximated by a bijective monomial. The reason is that, 
for gcd{c, 2^ — 1) = 1, the trace functions Tr{C,x‘^) and Tr{Xx),x G GF(2^), are 
both m-sequences with the same linear span. 

For Boolean functions with even number of input variables, bent functions 
achieve the maximal minimum distance to the set of affine functions. In other 
words, they achieve the maximal minimum distance to all the coordinate func- 
tions of affine monomials (i.e., functions in the form Tr{Xx) -be) ). However, this 
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doesn’t guarantee that such bent functions cannot be approximated by the co- 
ordinate functions of bijective monomials (i.e., functions in the form Tr{\x‘^) + 
e,gcd{c,2^ — 1) = 1). At Eurocrypt’ 2001, Youssef and Gong [19] introduced a 
new class of bent functions which they called hyper-bent functions. Functions 
within this class achieve the maximal minimum distance to all the coordinate 
functions of all bijective monomials. 

In this paper we provide some constructions for Boolean functions with odd 
number of inputs that achieve large distance to all the coordinate functions 
of all bijective monomials. Unlike the N even case, bounding the nonlinearity 
(NL) for functions with odd number of inputs, N, is still an open problem. 
For N = 1,3,5 and 7, it is known that max NL = However, 

Patterson and Wiedemann [15], [16] showed that for N = 15, max NL > 16276 = 
16384 — ||2t^. It should be noted that our task, i.e., finding functions with 
large distance to all the coordinate functions of all bijective monomials, is far 
more difficult than finding functions with large nonlinearity. For example, while 
the (experimental) average nonlinearity for functions with fV = 11 and 13 is 
about 941 and 3917 respectively, the (experimental) average minimum distance 
to the coordinate functions of all bijective monomials is about 916 and 3857 
respectively. 

We conclude this section with the notation and concepts which will be used 
throughout the paper. 

- F = GF(2). 

- E = GF(2^). 

- Tr)(^(a:), MjA^, represents the trace function from F 2 N to F 2 M, i.e., Tr)(^(a;) = 
X + x'^ + ■ ■ ■ + x'^ where q = 2^ and I = N/M. If M = 1 and the context 
is clear, we write it as Tr{x). 

- a = {oj}, a binary sequence with period s]2^ — 1. Sometimes, we also use 
a vector of dimension s to represent a sequence with period s. I.e., we also 
write a = (oq, oi, • • • , fls-i)- 

- Per(b), the period of a sequence b. 

- a^*) denotes the sequence obtained by decimating the sequence a by t,i.e., 

a(*) = {atj}j>o = oo, at, 02t, • • • . 

- w{s): the number of I’s in one period of the sequence s or the number of 

I’s in the set of images of the function s(x) : GF(2^) GF{2). This is the 

so-called the Hamming weight of s whether s is a periodic binary sequence 
or a function from GF(2^) to GF(2). 

- S denotes the set of all binary sequences with period r]2^ — 1. 

- F denotes the set of all (polynomial) functions from GF{2^) to GF{2). 

2 Preliminaries 

The trace representation of any binary sequence with period dividing 2^ — 1 is 
a polynomial function from GF{2^) to GF{2). Any such polynomial function 
corresponds to a Boolean function in N variables. This leads to a connection 
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among sequences, polynomial functions and Boolean functions. Using this con- 
nection, pseudo-random sequences are rich resources for constructing functions 
with good cryptographic properties. 

Any non-zero function f{x) G IF can be represented as 

S 

fix) = ^Trp(Aa:‘^), A G (1) 

where 1 < s < \ f2{2^ “1)1? — 1) is the set of coset leaders modulo 2^ — 1, 

ti is a coset leader of a cyclotomic coset modulo 2^ — 1, and mt^\N is the size of 
the cyclotomic coset containing ti. For any sequence a = {oi} G S, there exists 
f{x) G T such that 

Oi = f{a"),i = 0, 1, • • • , 

where a is a primitive element of E. f(x) is called the trace representation of a. 
( a is also referred to as an s-term sequence.) If f{x) is any function from E to 
F, by evaluating /(a*), we get a sequence over F with period dividing 2^ — 1. 
Thus 

(5 : a ^ fix) (2) 

is a one-to-one correspondence between T and S through the trace representation 
in (1). We say that fix) is the trace representation of a and a is the evaluation of 
fix) at a. In this paper, we also use the notation a ^ fix) to represent the fact 
that fix) is the trace representation of a. The set consisting of the exponents 
that appear in the trace terms of fix) is said to be the null spectrum set of fix) 
or a. 

If s = 1, i.e., 

a, = rrf(/3a0u = 0,l,--- ,/3 gE*, 

then a is an m-sequence over F of period 2^ — 1 of degree N. (For a detailed 
treatment of the trace representation of sequences, see [14]). 

3 Extended Transform Domain Analysis 
for Boolean Functions 

The Hadamard transform of / : E ^ F is defined by [1] 

fix) = ^(_l)/U)+Tr(Ax)^ ^ g (3) 

The Hadamard transform spectrum of / exhibits the nonlinearity of /. More 
precisely, the nonlinearity of / is given by 

NLif) = 2^-1 - imax|/(A)|, 

which indicates that the absolute value of /(A) reflects the difference between 
agreements and disagreements of fix) and the linear function Tr(Aa;). Only bent 
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functions [17] have a constant spectrum of their Hadamard transform. Gong 
and Golomb [9] showed that many block ciphers can be viewed as a non linear 
feedback shift register with input. In the analysis of shift register sequences [4], 
all m-sequences are equivalent under the decimation operation on elements in a 
sequence. The same idea can be used to approximate Boolean functions, i.e., we 
can use monomial functions instead of linear functions to approximate Boolean 
functions. 

Gong and Golomb [9] introduced the concept of extended Hadamard trans- 
form (EHT) for a function from E to F. The extended Hadamard transform is 
defined as follows. 

Definition 1. Let f(x) he a function from E to F. Let 

/(A, c) = (4) 

where A G E and c is a coset leader modulo 2^ — 1 co-prime to 2^ — 1. Then we 
call f{X,c) an extended Hadamard transform of the function f. 

Notice that the Hadamard transform of /, defined by (3), is /(A, 1). The numer- 
ical results in [9] show that, for all the coordinate functions /j, i = 1, • • • , 32 of 
the DES s-boxes, the distribution of /i(A, c) in A is invariant for all c. 

Thus a new generalized nonlinearity measure can be defined as 

NLG{f)=2^-^-l max |/(A,c)|. 

2 Age, 

c : gcd{c, 2^ — 1) = 1 

This leads to a new criterion for the design of Boolean functions used in 
conventional cryptosystems. The EHT of Boolean functions should not have any 
large component. 

In what follows we will provide constructions for Boolean functions with large 
distance to all the coordinate functions of bijective monomials. The construction 
method depends on whether is a composite number or not. 

4 Case 1: Af Is a Composite Number 

Let N = nm where n,m > 1. Let b = {bj}j>o be a binary sequence with 
per(b) = d = q = 2™, and ■ic(b) = v. Let g{x) ^ b. In the following, we 

derive some bounds on NLG{g) in terms of v. 

Write Oi = Tr”™(Q;*), z = 0, 1, • • • . Thus a = {oi} is an m-sequence of period 
2^ - 1. Let 

5(r) = |{0 < z < d\bi = l,Tr^ («*+") = 0}|. 

Lemma 1. With the above notation, we have 

w{Tr{a^x^) g{x)) = 2”™“^ — z; -I- q6{T). 



(5) 
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Proof. Throughout the proof, we will write 6{t) as 6 for simplicity. The sequence 
a can be arranged into a, {q — 1, (i)-interleaved sequence [8]. Thus a can be 
arranged into the following array 



A = 



ao ai 

cid ad+i 



ad-i 

a2d-l 



_'^d(q-2) Vd(q-2) + l ''' 1’{q-l)d-l 



= [^ 0 , Ai, - ■ ■ , Ad-i], 



where A^’s are columns of the matrix. Similarly we can arrange the sequence b 
in the following array 



B = 



bo 

bd 



bi 

bd+i 



bd-i 

b2d-i 



[bdiq -2) bd{q-2) + l b(q-l)d-lj 



Note that w(A) = \{{i,j)\aij = 1}, 0 < z < g — 1, 0 < j < d}|. Thus 



w{A + B) = ^ w{Ai) + ^ w{Ai + 1) 

bi=0 bi = l 



bi=0 bi = l 



In the array A, there are 



r = 



qn-i _ 1 

<7-1 



( 6 ) 



zero columns (See Lemma 1 in [18]). If there are 5 zero columns corresponding 
to the indices of the I’s in {bi}, then they contribute S{q — 1) I’s. Thus we have 



w{A + B) = 

E •"(■<.)+ E »(.4.)+ E (d—w{Ai))+ ^2 {q—^ — w{A^)). 

bi=0,Aii^0 bi=0,Ai=0 bi = l,AiiiO bi = l.Ai=0 

Since Afs are m-sequences, then for all the non-zero Afs we have w{Ai) = 2"*“^. 
Let 

Nij = \{bk = i,char{Ak) = j,0 < k < d}\, 
where i,j € {0, 1} and 



char{Ai) 



0 if Ai = 0, 

1 if Ai 0. 



Note that 

Nifi = S, 

Nifi + Nofi = r, 
No,o = r -5. 



( 7 ) 
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Hence we have 



Ni^o + -^1,1 = V ^ Ni^i = V — Ni^o = V — S, 

-/Vo,i + -^ 1,1 = d — r ^ Nq^i = d — r — Ni^i = d — r — {v — S) = d — r — v — S. 
Thus 

w{A + B) = 2™-i]Vo,i + OiVo,o + (2™-iiVi,i + (2™ - l)7Vi,o 

= 2™-i(rf -r-v)+ (52™-i + z;(2™-i - 1) - (52™-i +6 + 2^6 -6 
= 2™-i(d - r) - w2"*-i + v2™-i - u + 2™,5 
= 2""-i(ci - r) - V + 2™5 = 2""-i(d - r) - V + 2™A 

( 8 ) 

By noting that d — r = g” ^ then we have 

w{A + B) = 2”'"-! -v + 2^6, 
which proves the lemma. 



Theorem 1. With the notation above, if v = then 

NLG{g) > 2™-i - 

Proof. 

d-l 

g{0, c) = = 1 + E = 1 + (9 - 1) E(-l)'"‘ 

x^E* k—0 

= 1 + (g — l)((i — 2wt(b)) = q. 

For A 0, 

9{\c) = = 1 + 

= 2”™ - 2wt{A + B). 

Note that 6 < r = ^ • Thus 

w{A + B) < 2"™-i - 

and 

w{A + B)> ^ + qE = 

By noting that n > 1 then d — 1 > q and hence 

|?(A,c)| < (d- 1) 



(9) 



which proves the theorem. 

Using the construction above for iV = 9, m = n = 3we get NLG = 220. It is 
clear that, in order to maximize NLG, we should minimize d = . Thus 

we should choose m to be the large factor of N = n x m. For example, let 
iV = 15 = 3 X 5. If we choose m = 5, then we have NLG = 15856. However, if 
we picked m = 3, then we get NLG = 14044. 
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5 Case 2: N Is a Prime Number 

If iV is a prime number then the above sub- field construction is not applicable. 
This case is further divided into two cases depending on whether 2^ — 1 is a 
prime number or not. 



5.1 Case 2.1: 2^ — 1 Is a Prime Number 



In this case, we base our construction on the Legendre sequence. Let 7 be a 
primitive root of a prime p, then the Legendre sequence (also called quadratic 
residue sequence) of period p, p = 3 (mod 4) ,is defined by 

( 1 or 0, if i = 0 

tti = < 1, i is a residue (i = 7^® mod p 

[0, i is a non-residue {i ^ 7^® mod p) 

Note that for iV > 2 we always have 2^^ — 1 = 3 mod 4. The properties of 
Legendre sequences have been extensively studied (e.g., [2], [5], [6], [12]). In here 
we are concerned with the following fact: 

Fact 1 If a Legendre sequence of period p = 3 mod 4 is decimated with d then 
the original sequence is obtained ifd is a quadratic residue modp, and the reverse 
sequence is obtained if d is non-quadratic residue mod p. 

This fact can be easily explained by noting that the Boolean function corre- 
sponding to Legendre sequence has the following trace representation [12] 



fix) = Tr{x‘^), 

cGQR 



where QR denotes the set of quadratic residue mod 2^ — 1. 

Example 1. Let p=7, then a = {1110100} The sequences af'^) obtained by deci- 
mating a with d are given by 



a(i) = {1110100}, 

a(2) = {1110100}, 

a(3) = {1001011}, 
aW = {1110100}, 
a(5) = {1001011}, 
a(6) = {1001011}. 



(10) 



Note that a*^^^ = a*^^) = a^^^ since 1,2,4 are quadratic residue mod 7. Also 
a(3) = a*^®) = a*^®^ are the same since since 3,5,6 are quadratic non-residue mod 

7. 



The following property follows directly from Fact 1. 
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Property 1. Let / a where a is a Legendre sequence. Then we have 

NLG{f)= rmn{NL{f),NL{g)}, (11) 

where g and c is any quadratic non-residue modulo 2^ — 1. 

Example 2. For iV = 5, b = {1110110111100010101110000100100}. If we let 
/ ^ b with /(O) = 1 then we have /(A, c) G {—2, -6,-10, 2, 6, 10} for c G set 
of quadratic residue mod 31. /(A, c) G {—2, —6,2,10} for c ^ set of quadratic 
residue mod 31. Thus we have NLG{f) = 11. 

Table 1 shows NLG of the functions obtained from this construction. In this 
case, we set /(O) = 1. If we set /(O) = 0 then we obtain balanced functions for 
which NLG is 1 less than the values shown in the table. 

Table 2 shows NLG versus NL distribution for iV = 5. It is clear that our 
Legendre sequence construction achieves the maximum possible NLG. Table 3 
shows the same distribution for balanced functions. For N = 7 we searched all 
functions in the form [7] 

f{x)= Y. 

ceS7(2’^-l) 

where 17(2^ — 1) is the set of coset leaders mod 2^ — 1 and ric is the size of the 
coset containing c. Table 4 shows NLG versus NL distribution for this case. Table 
5 shows the same distribution for the balanced functions of the same form. Again, 
it’s clear that the construction above achieves the best possible NLG. For larger 
values of N, our construction is no longer optimum. For example, for N = 13, 
g{x) ^ b = |i mod 2 , i = 0, 1, • • • } have NLG = 3972. 



Table 1. 



N 


3 


5 


7 


13 


17 


19 


NLG 


1 


11 


55 


3964 


64816 


259882 



Table 2. A = 5 



NLG 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


NL 


























0 


64 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


2048 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


2 


0 


0 


31744 


0 


0 


0 


0 


0 


0 


0 


0 


0 


3 


0 


0 


0 


317440 


0 


0 


0 


0 


0 


0 


0 


0 


4 


0 


0 


0 


0 


2301440 


0 


0 


0 


0 


0 


0 


0 


5 


0 


0 


0 


0 


0 


12888064 


0 


0 


0 


0 


0 


0 


6 


0 


0 


0 


0 


13020 


0 


57983268 


0 


0 


0 


0 


0 


7 


0 


0 


0 


7440 


0 


3919392 


0 


211487952 


0 


0 


0 


0 


8 


0 


0 


2790 


0 


2396610 


0 


74021180 


0 


571246300 


0 


0 


0 


9 


0 


620 


0 


923180 


0 


39040780 


0 


544800200 


0 


777687700 


0 


0 


10 


62 


0 


149668 


0 


8474160 


0 


189406218 


0 


1022379070 


0 


191690918 


0 


11 


0 


9300 


0 


606980 


0 


19419516 


0 


232492250 


0 


302968890 


0 


911896 


12 


248 


0 


1302 


0 


263810 


0 


3803018 


0 


20035610 


0 


3283148 


0 
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Table 3 . N = 5 balanced case 



NLG 


0 


2 


4 


6 


8 


10 


NL 














0 


62 


0 


0 


0 


0 


0 


2 


0 


15872 


0 


0 


0 


0 


4 


0 


0 


892800 


0 


0 


0 


6 


0 


0 


6200 


19437000 


0 


0 


8 


0 


1550 


1074150 


27705010 


167500130 


0 


10 


62 


77128 


3274220 


62085560 


276057170 


34259588 


12 


248 


682 


109430 


1536050 


6312220 


735258 



Table 4 . N = 7 



NLG 


0 


2 


8 


14 


16 


22 


28 


30 


36 


42 


44 


46 


48 


50 


52 


54 


NL 


































2 


0 


2 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


8 


0 


0 


72 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


14 


0 


0 


0 


306 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


16 


0 


0 


0 


0 


306 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


22 


0 


0 


0 


0 


0 


3264 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


28 


0 


0 


0 


0 


90 


0 


6030 


0 


0 


0 


0 


0 


0 


0 


0 


0 


30 


0 


0 


0 


90 


0 


0 


1269 


4761 


0 


0 


0 


0 


0 


0 


0 


0 


36 


0 


0 


72 


0 


0 


3156 


4032 


2916 


23088 


0 


0 


0 


0 


0 


0 


0 


42 


0 


6 


0 


280 


460 


2448 


4715 


4408 


12927 


7012 


0 


0 


0 


0 


0 


0 


44 


6 


0 


0 


460 


280 


2448 


6696 


2427 


12927 


6248 


764 


0 


0 


0 


0 


0 


46 


0 


0 


0 


4 


1 


121 


157 


174 


326 


119 


0 


8 


0 


0 


0 


0 


48 


0 


0 


1 


25 


46 


578 


1232 


757 


2486 


1052 


3 


0 


50 


0 


0 


0 


50 


0 


0 


326 


918 


948 


10187 


16267 


9632 


32340 


16288 


33 


90 


401 


742 


0 


0 


52 


0 


1 


120 


549 


504 


4746 


6409 


4236 


12167 


5781 


15 


25 


170 


272 


5 


0 


54 


2 


10 


46 


228 


164 


1281 


2557 


1295 


4548 


1727 


1 


21 


5 


84 


1 


1 


56 


10 


0 


47 


47 


108 


619 


1270 


570 


2241 


926 


15 


0 


13 


27 


0 


1 



5.2 Case 2.2: TV Is a Prime Number 

and 2 ^ — 1 Is a Composite Number 

Let 2^ — 1 = dr, d > r > 1. In this case we a use construction similar to case 
1, i.e., we let / ^ b where per{h) = d and rc(b) = However, unlike case 1, 
there is no easy way to determine the weight distribution of Ai’s because they 
are no longer m-sequences. Using this approach for TV = 11, d = 89 we obtained 
several functions with NLG{f) = 980 = 2-^“^ — 
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Table 5. N = 7 balanced case 



NLG 


0 


2 


14 


16 


28 


30 


42 


44 


52 


54 


NL 






















0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


2 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


14 


0 


0 


81 


0 


0 


0 


0 


0 


0 


0 


16 


0 


0 


0 


81 


0 


0 


0 


0 


0 


0 


28 


0 


0 


0 


54 


1242 


0 


0 


0 


0 


0 


30 


0 


0 


54 


0 


561 


681 


0 


0 


0 


0 


42 


0 


6 


160 


232 


2144 


1997 


2517 


0 


0 


0 


44 


6 


0 


232 


160 


3067 


1074 


2510 


7 


0 


0 


46 


0 


0 


4 


1 


66 


76 


28 


0 


0 


0 


48 


0 


0 


21 


39 


904 


561 


747 


3 


0 


0 


50 


0 


0 


82 


115 


1220 


544 


908 


1 


0 


0 


52 


0 


1 


549 


504 


6409 


4236 


5781 


15 


5 


0 


54 


2 


10 


228 


164 


2557 


1295 


1727 


1 


1 


1 


56 


10 


0 


47 


108 


1270 


570 


926 


15 


0 


1 



6 Conclusions and Open Problems 

In this paper we presented some methods to construct functions with odd num- 
ber of inputs which achieve large minimum distance to the set of all bijective 
monomials. However, since a a general upper bound on NLG is not known, it is 
interesting to search for other functions that outperform the constructions pre- 
sented in this paper. Finding NLG of functions corresponding to the Legendre 
sequences is another interesting open problem. 
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Abstract. In this paper we provide a new generalized construction 
method of highly nonlinear t-resilient functions, E : FJ FJ*. The 
construction is based on the use of linear error correcting codes together 
with multiple output bent functions. Given a linear [m, m,t + 1] code 
we show that it is possible to construct n-variable, m-output, t-resilient 
functions with nonlinearity 2 — 2' 2 1 for n > u 3m. The 

method provides currently best known nonlinearity results. 

Keywords: Resilient functions, Nonlinearity, Correlation Immunity, 
Stream Ciphers, Linear Codes. 



1 Introduction 

A well known method for constructing a running key generator exploits several 
linear feedback shift registers (LFSR) combined by a nonlinear Boolean function. 
This method is used in design of stream cipher system where each key stream 
bit is added modulo two to each plaintext bit in order to produce the ciphertext 
bit. The Boolean function used in this scenario must satisfy certain properties 
to prevent the cipher from common attacks, such as Siegenthaler’s correlation 
attack [18], linear synthesis attack by Berlekamp and Massey [14] and different 
kinds of approximation attacks [7]. If we use multiple output Boolean function 
instead of single output one, it is possible to get more than one bits at each 
clock and this increases the speed of the system. Such a multiple output func- 
tion should possess high values in terms of order of resiliency, nonlinearity and 
algebraic degree. 

Research on multiple output binary resilient functions has received atten- 
tion from mid eighties [6,1,8,19,2,9,21,12,11,4,5]. The initial works on multiple 
output binary resilient functions were directed towards linear resilient functions. 
The concept of multiple output resilient functions was introduced independently 
by Chor et al [6] and Bennett et al [1]. A similar concept was introduced at 
the same time for single output Boolean functions by Siegenthaler [17]. Besides 
its importance in random sequence generation for stream cipher systems, these 
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resilient functions have applications in quantum cryptographic key distribution, 
fault tolerant distributed computing, etc. 

The nonlinearity issue for such multiple output resilient functions was first 
discussed in [ 20 ]. After that, serious attempts towards construction of nonlinear 
resilient functions have been taken in [ 21 , 12 , 11 , 5 ]. We here work in that direction 
and provide better results than the existing work. For given number of input 
variables n, number of output variables m, and order of resiliency t, we can 
construct functions F : F2 F™ that achieve higher nonlinearity values than 
existing constructions for almost all choices of n, m and t. 

The paper is organized as follows. Section 2 provides basic definitions and 
notations both for 1 -output and m-output functions, m > 1 . In Section 3 , we 
review some important techniques and results used towards the new construction 
of t-resilient functions. Section 4 provides the new construction based on the use 
of linear error-correcting codes together with bent functions. Some numerical 
values for the constructed functions and comparison with previous constructions 
are presented in Section 5 . Section 6 concludes this paper. 

2 Preliminaries 

For binary strings 81,82 of the same length A, we denote by #(S'i = 82) (respec- 
tively #(S'i yf 82)), the number of places where 81 and 82 are equal (respectively 
unequal). The Hamming distance between 81,82 is denoted by d{8i,82), i.e., 

d{8i,82) = #{81^82). 

Also the Hamming weight or simply the weight of a binary string 8 is the number 
of ones in 8. This is denoted by wt{8). 

By F2 we denote the vector space corresponding to the finite field F2«. The 
addition operator over F2 is denoted by 0 (the XOR operation, which is basically 
addition modulo 2 ). By Vn we mean the set of all Boolean functions on n- 
variables, i.e., Vn corresponds to all possible mappings F2 1-^ F2. We interpret a 
Boolean function f{xi , . . . , x„) as the output column of its truth table, that is, 
a binary string of length 2 ”, 

[/(0,0,...,0),/(l,0,...,0),/(0,l,...,0),..., /(I, !,...,!)]. 

An n-variable function / is said to be balanced if its output column in the truth 
table contains equal number of O’s and I’s (i.e., wt{f) = 2 ”“^). 

An n-variable Boolean function f{xi, . . . , x„) can be considered to be a mul- 
tivariate polynomial over F2. This polynomial can be expressed as a sum of 
products representation of all distinct fc-th order product terms (0 < fc < n) of 
the variables. More precisely, f{xi , . . . , x„) can be written as 



i—n 

f{xi , . . . ,X„) = Oo © (©«.x.)®( © aijXiXj) © ... © ai2...„a;iX2 ...x, 

i—1 
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where the coefficients oq, , . . . , ai 2 ...n G {Oj !}• This representation of / is 
called the algebraic normal form (ANF) of /. The number of variables in the 
highest order product term with nonzero coefficient is called the algebraic degree, 
or simply degree of /. 

Functions of degree at most one are called affine functions. An affine function 
with constant term equal to zero is called a linear function. The set of all n- 
variable affine (respectively linear) functions is denoted by A„ (respectively Ln). 
The nonlinearity of an n variable function / is 

nl{f) = ming(zA^{d{f,g)), 

i.e., the distance from the set of all n-variable affine functions. 

Let X = (a;i, . . . , x„) and u) = (a>i, . . . , w„) both belong to The dot product 
of X and w is defined as 



X ■ U) = XiLOi 0 ... 0 XnOJn- 

For a Boolean function f € Vn the Walsh transform of f{x) is a real valued 
function over F 2 that can be defined as 

Wf{iv) = ^ (_i)/(0®»=-. 

Next we define correlation immunity in terms of the characterization provided 
in [10]. A function f(x\, . . . ,Xn) is m-th order correlation immune (Cl) iff its 
Walsh transform Wf satisfies 

Wf{uj) = 0, for all w G F 2 s.t. 1 < wt{u)) < m. 

If / is balanced then W/(0) = 0. Balanced m-th order correlation immune func- 
tions are called m-resilient functions. Thus, a function f{xi , . . . , x„) is m-resilient 
iff its Walsh transform Wf satisfies 

Wf{uj) = 0, for all w G F 2 s.t. 0 < wt{u}) < m. 

Given all these definitions we now start the definitions with respect to the 
multiple output Boolean functions F 2 1 -^ F™. That is, in this case we provide 
the truth table of m different columns of length 2”. Let us consider the function 
F{x) : F 2 F™ such that F{x) = (/i(a;), . . . , fm{x)). Then the nonlinearity of 
F is defined as, 

m 

nl{F) = mm^ nl{^Tjfj{x)). 

^ i=i 

Here, Wtff* = F™\0 and r = (n, . . . ,Tm)- Similarly the algebraic degree of F is 
defined as, 

m 

deg{F) = ^mffi^ deg{^Tjff{x)). 
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Now we define an n- variable, m-output, f-resilient function, denoted by (n, m, t), 
as follows. A function F is an {n,m,t) resilient function, iff 0jLi Tj/j (a;) is 
an function (n variable, t-resilient Boolean function) for any choice of 

T G F™*. Since we are also interested in nonlinearity, we provide the notation 
(n, m, t, w) for an (n, m, t) resilient function with nonlinearity w. In this paper we 
concentrate on the nonlinearity value. Thus, for given size of input parameters 
n, TO, t, we construct the functions with currently best known nonlinearity. 



3 Useful Techniques 



In this section we will describe a few existing techniques that will be used later. 
First we recapitulate one result related to linear error correcting codes. The 
following lemma was proved in [11]. We will use it frequently in our construction, 
and therefore it is stated with the proof. 

Proposition 1. Let Cq, . . . , Cm-i be a basis of a binary [u, to, t + 1] linear code 
C. Let (3 be a primitive element in F 2 m and (1, /?,..., be a polynomial 
basis of F 2 m . Define a bijection (f> : F 2 m ^ C by 

4>{ao + aij3 + • • • Om-l/d™ = OoCo + OlCl + • • • am-lCm-l- 

Consider the matrix 

... 1)\ 

• • • 

... 

For any linear combination of columns (not all zero) of the matrix A*, each 
nonzero codeword of C will appear exactly once. 



A* = 



(/)(!) </>(/3) 



V <H1) 



Proof. Since ^ is a bijection, it is enough to show that the matrix 

/ 1 P ... 

P P^ ... P^ \ 



\ /? 2’”-2 1 ... ) 



has the property that each element in F^m will appear once in any nonzero linear 
combination of columns of the above matrix. 

Any nonzero linear combination of columns can be written as 



(co + CiP + • • • + Cm-l/3’” 



' ^ ^ 






for some cq, ci, . . . , Cm-i G IF 2 , and this gives the proof. 



□ 
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There are 2™ — 1 rows in the matrix A* . Let us only concentrate on the first 
2 m-i Qf ^]^jg matrix. That is, we consider each column to be of length 2™“^. 
It is clear that for any nonzero linear combination of the columns, a nonzero 
codeword of C will appear exactly once in it. Hence, in the resulting column 
of length 2™“^, no codeword will appear more than once. In this direction, we 
update Proposition I with the following result. 

Proposition 2. Let cq, . . . , Cm-i be a basis of a binary [u, m, t + 1] linear eode 
C. Let j3 be a primitive element in ¥ 2 -^ and (1 , /?,..., be a polynomial 
basis of F 2 m . Define a bisection (f> : F 2 m ^ C by 

(j>{ao + aiP H = aoco + aiCi H am-iCm-i- 



For 0 < q < m — 1, consider the matrix 



( 



D = 



m 






1 ) \ 






For any linear combination of columns (not all zero) of the matrix D, each 
nonzero codeword of C will either appear exactly once or not appear at all. 



Note that the entries of D are elements from F^. For convenience, we use a 
standard index notation to identify the elements of D. That is, dij denotes the 
element in z-th row and j-th column of D, for i = 1, . . . , 2*?, and j = 1, ... ,m. 

Throughout the paper we consider C to be a binary linear [m, m, t + 1] 
code with a set of basis vectors cq, ci, . . . , Cm-i- To each codeword Ci G C, 
z = 0, . . . , 2™ — 1, we can associate a linear function 1^ G Lu, where 

U 

Ici — Oi ‘ X — 

k=l 

This linear function is uniquely determined by c^. Since the minimum distance 
of C is t + 1, any function 1^ for Ci G C will be nondegenerate on at least t + 1 
variables, and hence t-resilient. 

According to Proposition 1, any column of the matrix A* can be seen as a 
column vector of 2™ — 1 distinct t-resilient linear functions on u variables. In [II], 
it was proved that the existence of a set C of linear [zz, m, t + 1] nonintersecting 
codes of cardinality jCj = |’2"““/2'"— 1] was sufficient and necessary requirement 
in construction of an (zz, m, t, 2"“^ — 2““^) function. A set of linear [zz, rrz, t + 1] 
codes C = {Cl, C 2 , . . . , Cs} such that Ci n Cj = {0}, 1 < z < j < s, is called a 
set of linear [zz, m, t + 1] nonintersecting codes. 

The results in [11] were obtained using a computer search for the set C. Good 
results could be obtained only for small size of rz < 20, thus not providing a good 
construction for arbitrary rz. 
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In this initiative our approach is different. We do not try to search for non- 
intersecting linear codes. We only consider a single linear code with given pa- 
rameters and use a repetition of the codewords in a specific manner. If we look 
into the matrix D of Proposition 2, and consider each column as concatenation 
of 2* (0 < <7 < m — 1) linear functions on u variables, then each column can be 
seen as a Boolean function on u + q variables, i.e., gj G Vu+q, j = 1, . . . , m. In 
the ANF notation the functions gj G Vu+g will be given by, 

9j{y,x) = 0 (yi © Ti) • • • {yq © Tq){d[r] + lJ ' x) , 
reFi 

where [r] denotes the integer representation of vector r. Once again note that we 
have denoted the elements of D matrix as j , for i = 1, . . . , 2^, and j = 1, . . . , m. 
Since each of the constituent linear functions is nondegenerate on t-|- 1 variables, 
they are all t-resilient. Thus, each of the {u + < 7 )-variable Boolean function gj is 
t-resilient. Next we have the following result on nonlinearity. 

Proposition 3. Any nonzero linear combination of the functions gi, ■ ■ ■ ,gm has 
the nonlinearity — 2”“^. 

Proof. From [16], we have, nl{gj) = 2“+®“^ — 2““^ for j = 1, ... ,m. Moreover, 
from Proposition 2, it is clear that any nonzero linear combination of these 
functions (/i , . . . , will have the same property. □ 

Hence we get the following result related to multiple output functions. 

Proposition 4. Given a [w,TO,t + l] linear code, it is possible to construct (u + 
q, m, t, 2“+9~^ — 2““^) resilient functions, for 0 < q < m — 1. 

A simple consequence of Proposition 4 is that for given m and t our goal is 
to use a linear code of minimum length, i.e., u should be minimized, since the 
nonlinearity is maximized in that case. Throughout this paper the functions 
constructed by means of Proposition 4 will be denoted by gj . We immediately 
get the following corollary concerning the construction of 1-resilient functions. 

Corollary 1. It is possible to construct an {n = 2m,m,l,nl(F) = 2"“^ ~ 2^) 
function F{x). 

Proof. It is possible to construct [m + 1 ,to, 2] linear code. Putting u = m + 1 
and q = m — 1, we get (n, m, 1, 2"“^ — 2™) resilient functions. □ 

Thus, using Corollary 1 with m = 16, we can construct 1-resilient function 
F{x) : 1 -^- F 2 ® with nonlinearity Np = 2”“^ — 2^ = 2^^ — 2^®. This function 

can be used in a stream cipher system where at each clock it is possible to get 
2-byte output. 

Next we look into a more involved technique. For this we need a set of m 
bent functions such that any nonzero linear combination of these bent functions 
will also be a bent function. 

The following proposition is well known and therefore stated without proof 
(for proof see [16]). 
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Proposition 5. Let h{y) € Vk and g{x) € . Then the nonlinearity of 

f{y,x) = h{y) (B g{x) is given by, nl{f) = 2^nl{g) +2'^^nl{h) — 2nl{g)nl{h). 

Next we present the following Corollaries which will be useful in the sequel. 

Corollary 2. Let h{y) he a bent function on Vk, k = 2m. Let g{x) G with 
nl{g) = 2"“^, for u < n\. Then the nonlinearity of f{y,x) = h{y)(Bg{x) 

is given by, nl{f) = — 2^2““^. 

Proof. Put nl{h) = 2^~^ — 2^~^ in Proposition 5. □ 



Corollary 3. Let h'{y') be a bent functions on Vk, k = 2r, and let h{y) he a 
function on Vk+i given by h{y) = Xk+i © h'{y'). Let g{x) G with nl{g) = 
2”“^“^ — 2““^, for u < n\. Then the nonlinearity of f{y,x) = h{y)®g{x) is given 
by, nl{f) = 2”!+'=-! - 2^2“-b 

Proof. Put nl{h) = 2^“^ — 2^~^ in Proposition 5. □ 



Corollary 4. Let h{y) he a constant function on Vk, k > 0. Let g{x) G Vm with 
nl{g) = 2””^“^— 2"“^, for u < n\. Then the nonlinearity of f{y,x) = h{y)(Bg{x) 
is given by, nl{f) = — 2'^2““^. 

Proof. Put nl{h) =0 in Proposition 5. □ 

Thus, using the composition of bent functions with resilient functions, one 
may construct highly nonlinear resilient Boolean functions on higher number of 
variables. The question is if we may use the same technique for construction of 
multiple output functions. In other words, we want to find a set of bent functions 
of cardinality 2*” — 1, say B = {bi, . . . , & 2 ™-i}, with basis b\,. . . ,bm, such that 
0^1 Tjbj G B, for r G F^*. 

Now we discuss the construction in more detail [15]. Let A be of size 2™ x to 
given by Gl = (^), where A* is a matrix constructed by means of Proposition 1 
using Co, . . . ,Cm-i, that spans an [to, to, 1] code C with the unity matrix / as 
the generator matrix. Now consider each column of the matrix A, which can 
be seen as concatenation of 2™ distinct linear functions on to variables. This is 
a Maiorana-McFarland type bent function in 2TO-variables. Also using Proposi- 
tion 1, it is clear that any nonzero linear combination of these bent functions 
will provide a bent function. The algebraic degree of this class of bent functions 
is equal to to. Thus, we have the following result. 

Proposition 6. Lt is possible to get m distinct bent functions on 2m-variables, 
say bi, . . . , bra, such that any nonzero linear combination of these bent functions 
will provide a bent function. Also, Tibi) = to, for r G F™*. 
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Example 1. Let m = 2 and cq = (01), ci = (10). We use an irreducible polyno- 
mial p{z) = z'^ + z + 1 to create the field F22. Then it can be shown that the 
matrix A is given by, 

/ 0 0 \ 

Co Cl 

Cl Co + Cl 
\ Co -I- Cl Co / 

In the truth table notation, let us consider the 4- variable bent function gi{x) 
as the concatenation of the 2- variable linear functions 0, a;i, X2, x\ 0X2 and simi- 
larly, g 2 {x) as concatenation of 0, X2, xi 0X2, xi. Then the function gi{x) ® g 2 {x) 
is also bent, which is a concatenation of 0,xi 0 X2 ,xi,X 2- 

Also note the following updation of Proposition 6. 

Proposition 7. It is possible to get m distinct bent functions on 2p-variables 
{p > m), say &i, . . . , bm, such that any nonzero linear combination of these bent 
functions will provide a bent function. Also, deg(0™ Tibi) = p, for r € F™*. 

With these results we present our construction method in the following sec- 
tion. 



4 New Construction 

In this section we will first provide the general construction idea using a [u, m, t+ 
1] linear code and then we will use specific codes towards construction of resilient 
functions of specific orders. Let us first discuss the idea informally. We take the 
matrix D as described in Proposition 2. Now it is clear that each column of D 
can be seen as a u0g variable function with order of resiliency t and nonlinearity 
2 u+q-i _ 2“-i. Let us name these functions as gi,. . .gm- From Proposition 4, 
it is known that any nonzero linear combination of these functions will provide 
u+g variable function g with order of resiliency t and nonlinearity 2"+^“^ — 2““^. 

Now we concentrate on n-variable functions. It is clear that the (u + q)- 
variable function need to be repeated times to make an n-variable func- 

tion. We will thus use an (n — u — (7)-variable function and XOR it with the 
(u0(7)-variable function to get an n-variable function. Also to get the maximum 
possible nonlinearity in this method, the (n — u — g)-variable function must be 
of maximum possible nonlinearity. We will use m different functions hi, , hm 
and use the compositions fi = hi (B gi, . . . , fm = hm 0 5m, to get m different 
n-variable functions. Thus any nonzero linear combination of /i, . . . , fm can be 
seen as the XOR of linear combinations ot hi, . . . , hm and linear combinations of 
gi, ... ,gm. In order to get a high nonlinearity of the vector output function we 
will need high nonlinearity of the functions hi, ... , hm and also high nonlinearity 
for their linear combinations. 

If {n — u — q) is even, we can use bent functions hi, ... , hm. Importantly, we 
require m different bent functions (as in Proposition 6) such that the nonzero 
linear combinations will also produce bent functions. For this we need n — u — q > 
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2m (see Proposition 7). If {n — u — q) is odd, we can use bent functions bj of 
{n — u — q — 1) variables and take hj = 0 bj. This requires the condition 

n — u — q — 1 > 2m to get m distinct bent functions as in Proposition 7. 

It may very well happen that the value of n — u — q may be less than 2m 
and in such a scenario it may not be possible to get 2m bent functions with 
desired property. In such a situation we may not get very good nonlinearity. We 
formalize the results in the following theorem. 



Theorem 1. Given a linear [u, m, t0l] code, for n>uitis possible to construct 
(n, m, t, nl{F)) function F = (/i, . . . , fm), where 



nl{F) = 



'2”-i_2“-i, u<n<u + m; (1) 

2«-i _ 2"-m^ u + m<n<u + 2m; (2) 

< 2”-i _2“+"*-i, u + 2m<n<u + 3m; (3) 

2^-1 _ 2 2 j > y _l_ fi — — iji + 1 even; (4) 

^ — 2" 2 ™ ^ > y _l_ .fj _ y _ ^ _l_ X (5) 



Proof. We consider different cases separately. We will use functions gi, ... ,gm on 
u + q variables which are basically concatenation of q distinct linear functions on 
u variables. These linear functions are nondegenerate on at least t+1 variables. 
From Proposition 3, we get that for any r € F™*, Tj-gj) = 2“+9-i _ 

2«-i^ Next we consider m different functions hi, . . . , hm on {n — u — q) variables. 
We will choose those functions in such a manner so that, for any t G F™*, 
nl(0”^i Tjhj) is high. Mostly we will use bent functions as in Proposition 6 and 
Proposition 7 in our construction. Now we construct the vector output function 
F = (/i, ...,fm) where, fj = hj © gj. For any r G F^*, 0”b^ Tjfj{x) can be 
written as TjPj. This can be done since the set of variables 

are distinct. The input variables of gj’s are xi, ... , Xu+q and the input variables 
of hj s are Xuj-qj-i, ■ ■ ■ 5 ^n- 

1. Here, u < n < u + m. By Proposition 4, we construct (n = u + q, m, t, 2”“^ — 
2““^) function F. 

2. Let u + m<n<u + 2m. Here we take q = m — 1 in Proposition 2. 

The functions gj's are of u © m — 1 variables. Thus we need to repeat each 
function times. We will use functions /i^ ’s of (n — u — m + l) variables 

which are constant functions. We know, nl{gj) = 2“+™“^ — 2““^. Hence, 
nl{fj) = 2"-«-"i+ 1(2“+™-2 _ 2“-i) = 2”-i - 2”"™ as in Corollary 4. 

3. Let u + 2m < n < u + 3m. We take q such that n — u — q = 2m. In 
this case gj’s are of u + q variables. We take m bent functions hj’s, each 
of 2m- variables as in Proposition 6. We know, nl{gj) = 2“+'?“^ — 2““^ and 
nl{hj) = 2^™“^ — 2™“^. Thus, if we consider the function F = (/i, . . . , /„), 
we get, nl{F) = 2"“^ — 2"+’”“^ as in Corollary 2. 

4. For n>u + 3m — 1, n — u — m + 1 even, we use q = m—1 and a set of bent 
functions on n — u — m+ \ variables. Note that in this case n — u — m+1 > 
2m. Thus we will get a set of m bent functions as in Proposition 7. Here, 
nl{gj) = 2“+'"-! - 2“-i and nl{hj) = 2("-“-™+i)-i - 2 "~"F"^^' -i. Thus 
we get, nl{F) = 2”“^ — 2^2 as in Corollary 2. 




Linear Codes in Constructing Resilient Functions with High Nonlinearity 



69 



5. For n > u + 3m, n — u — m + 1 odd, we use q = m — 1 and a set of bent 
functions on n — u — m variables, say bi, . . . ,bm as in Proposition 7. Note 
that in this case, n — u — m> 2m. We construct hj = 0 bj. Thus we get, 

nl{gj) = 2“+™-i _2“-i and nl{hj) = _ 2 ^ 

case, the nonlinearity is nl{F) = 2”“^ — 2 "'''2 as in Corollary 3. □ 

Note that Corollary 1 in Section 3 is a special case of the item 1 in the 
above theorem. Next we consider the algebraic degree of functions constructed 
by means of Theorem 1. 

Theorem 2. In reference to Theorem 1, the algebraic degree of the function F 
is given by, 



2 < deg{F) <n — u+1, u<n<u + m] (1) 

2 < deg{F) < m, u + m<n<u + 2m; (2) 

{ TO, u + 2m <n<u + 3m; (3) 

, n>u + 3m — 1, n — u — m + \ even; (4) 
n>u + 3m,n — u — m+\ odd. (5) 

Proof. Let us consider any nonzero linear combination / of Also 



we denote any nonzero linear combination of hj’s as h and that of gj’s as It 
is clear that deg{F) = deg{f) = m.a,x{deg{h),deg{g)), as h,g are functions on 
distinct set of input variables. 

1. Here / can be seen as the concatenation of 2‘^ linear functions (0 < g < to) of 
u variables each. The exact calculation of algebraic degree will depend in a 
complicated way on the choice of the codewords from C. However, it is clear 
that the function is always nonlinear and hence the algebraic degree must 
be > 2. Also the function / will have degree at most q+ 1. Here q = n — u, 
which gives the result. 

2. In this case q = m—1. Now / can be seen as the 2”“““^ times repetition of 
function g, where g is the concatenation of 2'^ linear functions (0 < g < to) of 
u variables each. The exact calculation of algebraic degree will depend in a 
complicated way on the choice of the codewords from C. However, it is clear 
that the function is always nonlinear and hence deg{f) > 2. Furthermore, 
the function g will have degree at most g 0 1. Thus the result. 

3. In this case deg{f) = m&x{deg{h) , deg{g)) . Now, deg{h) = to as we consider 
2m variable bent functions with property as described in Proposition 6. Also, 
deg{g) is at most g 0 1. Now, u 0 2m < n < u + 3m, which gives q < m. 
Hence deg{f) = to. 

4. In this case deg(h) = (from Proposition 7) and deg{g) < q+1 = m. 

Here n> u + 3m — I, i.e., n — u — to0I> 2m, which gives > tji. 

Thus deg(f) = 

5. In this case deg{h) = and deg{g) < g 0 I = to. Here n > u + 3m, 

i.e., n — u — m > 2m, which gives 2 ~™ > ’m-- Thus deg{f) = LI 
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At this point, let us comment on construction of resilient functions of order 
1 and 2. First we concentrate on 1-resilient functions. Let Ci be an [m + 1, m, 2] 
linear code in systematic form, i.e., Ci = (d|l), where I is an identity matrix of 
size mxm, and 1 is a column vector of all ones. In this case, we have u = m+ 1. 
Then we can apply Theorem 1 on this [m + 1, m, 2] code. 

Next we look into the construction of 2-resilient functions. From the theory 
of error correcting codes we know that for any I > 3 there exists a linear [u = 
2^ — l,m = 2* — / — 1,3] Hamming code. The codewords from such a code 
provide the construction of (n, m, 2, nl{F)) nonlinear resilient functions F. Also, 
given it is possible to obtain a sequence of linear codes of different length and 
dimension. In other words, given a linear [2* — 1, 2* — I — 1, 3] Hamming code the 
generated sequence of codes is [2* — 1 — j, 2* — ^ — 1 — j, 3], for j = 0, 1, ... , 2^“^ — 1. 
This code with Theorem 1 can be used to construct 2-resilient functions with 
high nonlinearity. Note that this construction of 2-resilient functions is not the 
best using this technique due to the existence of better linear [n, m, 3] codes than 
those provided by the Hamming design. 

The construction of resilient functions using simplex code has been discussed 
in [5]. A simplex code [13] is a [2™ — l,m, 2™“^] linear code, whose minimal 
distance is maximal. By concatenating each codeword v times, one can get a 
[v(2’" — l),m, u2’"“^] linear code. Given Theorem 1, one can use such codes for 
construction of functions with order of resiliency u2’”“^ — 1. 

Given a linear [u, m, t+ 1] code, where fixing u, m the maximum possible t+ 1 
value can be achieved, will obviously be the most well suited for our construction 
as this will maximize the order of resiliency. Such table for u,m < 127 is available 
in [3]. 

5 Results and Comparison 

In this section we compare the results obtained using the techniques presented 
in the previous section with the existing results. It was demonstrated that for a 
low order of resiliency and a moderate number of input variables the construc- 
tion in [11] was superior to the other constructions, namely the constructions 
in [12,21]. However, the main disadvantage of the construction in [11] is the 
necessity of finding a set of nonintersecting linear codes of certain dimension. 
This may cause a large complexity for the search programs, since there is no 
theoretical basis for finding such a set. Next we show that our results are supe- 
rior in comparison to [21,12,5]. Note that the construction of [12] gives higher 
nonlinearity than [21], whereas the construction of [21] provides larger order of 
resiliency than [12]. 

Theorem 3. [21, Gorollary 6] If there exists a linear (ji,m,t) resilient function, 
then there exists a nonlinear (n, m,t, 2"“^ — 2 "“t) whose algebraic degree is 
m — 1. 

Note that given any [m, m,t+l] code, it is easy to construct a linear {u, m, f) 
function. Thus, using the method of [21] it is possible to construct a nonlinear 
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{u,m,t) function also. Consequently, for n = u, the result of [21] provides the 
presently best known parameters. Note that there are some cases (when the value 
of n is very close to u, which falls under item 1 of Theorem 1) where the results 
of [21] are better than ours. This is when u — 1 > n — i.e., n < w + ^ — 1. 

However, if we fix the values of m, t, then for the values of n that falls under 
items 2, 3, 4 and 5 of Theorem 1 (and also under item 1 when n > u + ^ — 1), 
our nonlinearity supersedes that of [21]. Hence, as we choose n comparatively 
larger than m, n>u+^ — 1, the advantage of [21] decreases and our method 
provides better result. Moreover, the items 3, 4, 5 of Theorem 2 show that the 
algebraic degree of our construction is better than (m — 1) given in [21]. We 
present an example here for the comparison. 

We know the existence of a [36,8, 16] linear code. Hence, it is easy to get a 
linear (36, 8, 15) resilient function. Using the method of [21] it is possible to get 
a (36, 8, 15, 2^®“^ — = 2®® — 2®^) function. Moreover, it has been mentioned 

in [12, Proposition 19] how to get a (36, 8, 15, 2®® — 2®^) function using the tech- 
nique of [21]. Our method can not provide a function with these parameters. Let 
us now construct a function on larger number of input variables, say n = 43, 
for same m and t. For n = 43 and t = 15 the best known linear code have the 
parameters [43, 12, 16]. Then, with construction in [21], it is possible to construct 
a (43,12,15,2'^^ — 2®”^) and consequently a (43,8,15,2"^^ — 2®’^) function using 
less number of output columns. In our construction we start with a [36, 8, 16] 
code and applying item 1 of Theorem 1 we obtain a (43, 8, 15, 2^^ — 2®®) function 
which provides better nonlinearity. 

Theorem 4. [12, Theorem 18] For any even I such that I > 2m, if there ex- 

ists an (n — l,m,t) function <P{x), then there exists an (n, m,t,2”“^ — 2 "“ 2 “^) 
resilient function. 

Note that if there exists a linear [u = n — l,m,t 1] code, then by the 
above theorem [12] it is possible to get the nonlinearity 2"“^ — 2” 2 1 = 

2"“^ — 2t Items 4 and 5 of our Theorem 1 provide better nonlinearity 
than [12]. Also a closer look reveals that our construction outperforms the result 
of [12] for any n > u, with same quality result for n = u-\- 2m. 

Next we compare our result with a very recent work [5]. 

Theorem 5. [5, Theorem 5] Given a linear [u,m,t -I- 1] code (0 < m < u), 

for any nonnegative integer A, there exists a {u-\- A-\- l,m,t) resilient function 
with algebraic degree A, whose nonlinearity is greater than or egual to 2“+^^ — 
2"[\/2“+'^'+iJ -k 2““b 

Thus it is clear that given a linear [m, m, t-l- 1] code, the above construction pro- 
vides (n, m, t, 2"“^ —2^^ -1-2““^) resilient function. Note that the construction 
provides some nonlinearity only when n — 1 > i.e., n > 2u 2. It is very 

clear that our construction of (n, m, t, 2"“^ — 2 L”^“ 2 ’" J ) resilient functions for 
n > u 3m presents much better nonlinearity than that of [5] . However, com- 
paring our result in Theorem 2 with [5, Theorem 5], it is clear that in terms of 
algebraic degree the result of [5] is superior to our result. It will be of interest to 




72 



Enes Pasalic and Subhamoy Maitra 



construct functions with nonlinearity as good as our results with better algebraic 
degree as given in [5]. 

5.1 Examples 

Next we compare the results with specific examples. Let us start with the 
construction of a (24, 4, 2, n/(F)) function F{x). Given m = 4, it is possi- 
ble to construct a nonlinear function F(x) using the technique in [21] with 
nl{F) > 2^^ — 2^^. We know the existence of [7,4,3] linear Hamming code [13]. 
This gives (7,4,2) resilient function. Using the construction in [12], we obtain a 
function F{x) with nl{F) > 2^^ — 2^®. 

In our notation, u = 7, m = 4, t = 2. In this case, n—u—m+1 = 24-7—4-1-1 = 
14 and n = 24 > u + im— 1 = 18. Thus from Theorem 1, we get the nonlinearity 
223 _ 213^ Thus, our technique provides the currently best known nonlinearity. 

Starting with a [7,4,3] code, if we use the construction of [5], we will get 
(24, 4, 2, 2^^ — 2^®-|-2®) resilient function. To obtain the same value of nonlinearity 
using the construction in [11], one is forced to find |C| = ]"2”““ /(2"‘ — 1)] = 
[2^°/15] nonintersecting linear [14,4,3] codes, and this is computationally an 
extremely hard problem to solve. 

In [12] the construction of a (36, 8, 5, n/(F)) function was discussed. Using 
a linear [18,8,6] code the authors proved the existence of (36, 8, 5, nl(F)) func- 
tion, where nl{F) > 2^^ — 2^®. We use a linear [17,8,6] code [3] to construct 
a (36,8,5,2®® — 2^"^) function (here n> u + 2m) by means of Theorem 1. Us- 
ing the same linear code, we can obtain a (40,8,5,2®® — 2®"^) function (here 
n > u + 3m — 1). 

Nonlinearity of (36, 8, t) resilient functions has been used as important ex- 
amples in [12,11]. We here compare our results with existing ones. 

In this table the results of [12] are the existing best known construction re- 
sults and our results clearly supersede these [12]. The results of [11] are not 
the construction results. They show that resilient functions with such parame- 
ters exist. However, the construction of functions with such parameters are not 
available in [11]. Note that, for resiliency of orders 3, 2 and 1 our construction 
provides better results than the existential bound in [11]. In the last row of Table 
1, we describe the linear codes [3] which we use for our construction. 



Table 1. Nonlinearity of (36, 8, t) resilient fnnctions. 



Order of resiliency t 


7 


5 


4 


3 


2 


1 


Nonlinearity of [12] 


235 _ 227 


235 _ 226 


235 _ 225 


235 _ 224 


235 _ 223 


235 _ 222 


Nonlinearity of [11] 


235 _ 222 


235 _ 223 


235 _ 222 


235 _ 222 


235 _ 221 


235 _ 221 


Our nonlinearity 


235 _ 227 


235 _ 224 


235 _ 223 


235 _ 220 


235 _ 219 


235 _ 218 


The codes 


[20,8,8] 


[17,8,6] 


[16,8,5] 


[13,8,4] 


[12,8,3] 


[9,8,2] 
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6 Conclusion 

A new generalized construction of highly nonlinear resilient multiple output 
functions has been provided. The construction is based on the use of linear codes 
together with a specific set of bent functions. We show that our construction 
outperforms all previous constructions for almost all choices of input parameters 
n, m, t. Many examples are provided demonstrating the better nonlinearity 
attained using this new construction in comparison to the previous ones. It will 
be of interest to construct functions with better nonlinearity than our method 
or to show that some of our constructions provide optimized nonlinearity which 
can not be improved further. 
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Abstract. In stream ciphers, we should use a t-resilient Boolean func- 
tion f{X) with large nonlinearity to resist fast correlation attacks and 
linear attacks. Further, in order to be secure against an extension of 
linear attacks, we wish to find a t-resilient function f{X) which has a 
large distance even from low degree Boolean functions. From this point 
of view, we define a new covering radius p{t, r, n) as the maximum dis- 
tance between a t-resilient function f{X) and the r-th order Reed-Muller 
code RM(r,n). We next derive its lower and upper bounds. Finally, we 
present a table of numerical bounds for pit, r, n). 

Keywords: Nonlinearity, t-resilient function, Reed-Muller code, cover- 
ing radius, stream cipher. 



1 Introduction 

Nonlinearity and resiliency are two of the most important cryptographic criteria 
of Boolean functions which are used in stream ciphers and block ciphers. The 
nonlinearity of a Boolean function f{X), denoted by n?(/), is the distance be- 
tween f{X) and the set of affine (linear) functions. It must be large to avoid 
linear attacks. 

fix) is said to be balanced if #{A | /(A) = 0} = #{A | /(A) = 1} = 2"“^, 
where A = ixi, . . . ,x„). Suppose that /(A) is balanced even if any t variables 
Xij , . . . , Xij are fixed to any t values bi-^, . . . ,bi^. Then /(A) is called a t-resilient 
function. /(A) should be t-resilient for large t to resist fast correlation attacks in 
stream ciphers such as combination generators and nonlinear filter generators. 

Therefore, /(A) should satisfy both large nonlinearity nt(/) and large re- 
siliency. Recently, Sarkar and Maitra derived an upper bound on n/(/) of t- 
resilient functions [5]. 

We further observe that /(A) should not be approximated even by low degree 
Boolean functions (/(A) in order to be secure against an extension of linear 
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attacks [3]. Note that the set of n variable Boolean functions g{X) such that 
deg{g) < r is identical to an error correcting code known as the r-th order 
Reed-Muller code RM{r,n). 

Consequently, we wish to find a t-resilient function f{X) which has a large 
distance even from RM(r, n) for small r. On the other hand, the covering radius 
of RM{r,n), denoted by p{r,n), is defined as the maximum distance between 
f{X) and RM{r,n), where the maximum is taken over all n variable Boolean 
functions f{X). That is, 

Hof 

p{r,n) = m(ixd{f{X),RM{r,n)). 
f{^) 

In this paper, we introduce a new definition of covering radius of RM(r,n) 
from this point of view. We define t-resilient covering radius of RM (r, n), denoted 
by p{t,r,n), as the maximum distance between a t-resilient function f{X) and 
RM{r, n), where the maximum is taken over all t-resilient functions f{X). That 
is, 

p{t,r,n)^= max d{f{X),RM{r,n)). 

i-resilient f{X) 

We then derive lower bounds and upper bounds on p{t,r,n). The result of 
Sarkar and Maitra [5] is obtained as a special case of one of our upper bounds. 
Finally, we present a table of numerical bounds for p(t, r, n) which are derived 
from our bounds. 

2 Preliminaries 

Let X (^ 1 , ■ ■ ■ 5 

2.1 Nonlinearity of Boolean Functions 

Define the distance between two Boolean functions f{X) and g{X) as 

d{f{X),g{X)) #{X I f{X) ^ g{X)} . 

Define the weight of f{X) as 

w{f) = #{X I f{X) = 1} . 

A Boolean function such that Oq © aiX\ © • • • © a„a;„ is called an affine function. 
Let An denote the set of n variable affine functions. That is, 

A def r T 

An = {Oo © aiXi © • • • © anXn\ ■ 

The nonlinearity of f{X), denoted by nl{f), is defined as the distance between 
f{X) and An- That is, 

nl{f)=^ min d{f{X),g{X)) . 
g(x)aAn 
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Cryptographically secure Boolean functions should have large nonlinearity to 
resist linear attacks. Then the following upper bound is known. 

nl{f) < 2”-i -2^-1 . 

It is tight if n = even. f{X) which satisfies the above equality is called a bent 
function. 



2.2 t-Resilient Function and its Nonlinearity 

f{X) is said to be balanced if 

#{X I f{X) = 1} = #{X I f{X) = 0} = 2"-i . 

Suppose that f{X) is balanced even if any t variables , . . . , are fixed to any 
values bij^, . . . ,bi^. Then f{X) is called a t-resilient function. Boolean functions 
used in stream ciphers should be t-resilient for large t to resist fast correlation 
attacks. 

Therefore, f{X) should satisfy both large nonlinearity nl{f) and large re- 
siliency. Sarkar and Maitra derived an upper bound on nl{f) of t-resilient func- 
tions [5]. 

Proposition 2.1. Let f{X) be a t-resilient function and 1{X) be an affine func- 
tion. Then 

d{f{X),l{X))=0inod 



Proposition 2.2. Suppose that f{X) is a t-resilient function. Ifn = even, then 



nl{f) < 



2”-i_2*+i ift+l>n/2-l 

2"-i_2t-i-2*+i ift+l<n/2-l 



They derived a similar bound for n = odd. 



3 Reed-Muller Code and Its Covering Radius 

Any Boolean function is written as the algebraic normal form such that 
g{X) = ao® ^ OiXj 0 ^ aijXiXj ® ® ai^ 2 ,...,nXiX 2 ■ ■ ■ Xn 

l<z<n 

The degree of g{X), denoted by deg((/), is the degree of the highest degree term 
in the algebraic normal form. The r-th order Reed-Muller code RM (r, n) is 
identical to the set of n- variable Boolean function g{X) such that deg(g) < r. 

The covering radius of RM(r,n), denoted by p(r,n), is defined as the max- 
imum distance between f{X) and RM(r,n), where the maximum is taken over 
all n variable Boolean functions f{X). That is, 

p{r,n) max d(/(A), i?M(r, n)). 
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where 

d{f{X),RM{r,n)) = min d{f{X),g{X)). 

deg(g)<r 

Note that p(l,n) is equal to the maximum nonlinearity of n-variable Boolean 
functions. 

In the following table, the best known numerical bounds for p(r, n) with 
n <7 are presented. 



n 


1 


2 


3 


4 


5 


6 


7 


r = 1 


0 


1 


2 


6[4] 


12 


28 


56 


r = 2 




0 


1 


2 


6W 


18(6] 


40W_44[2] 


r = 3 






0 


1 


2 


8W 


20W-23[il 


r = 4 








0 


1 


2 


8W 


r = 5 










0 


1 


2 


r = 6 












0 


1 


r = 7 














0 



It is easy to see the following propositions. 

Proposition 3.1. Any Boolean function f{xi , . . . ,x„) such that deg(/) < r is 
written as 

f{X) = fl{xi, . . .,Xn-l) 0 Xn ■ f 2 (xi, . . . ,X„_i) , 
where deg(/i) < r and deg(/ 2 ) < r — 1. 

Proposition 3.2. d{f,g®h) > d{f, g) — w{h). 

Proof. 

d{f,g®h) = w{f®g®h) 

> w{f®g) -w{h) 

= d{f,g) - w{h) 



□ 



4 New Covering Radius for t-Resilient Functions 

4.1 New Covering Radius 

Boolean functions f{X) used in stream ciphers and block ciphers should not be 
approximated by affine (linear) functions to resist linear attacks. This leads to 
the notion of the nonlinearity nl{f) which is defined as the distance between 
f{X) and the set of affine (linear) functions. 

We also observe that f{X) should not be approximated even by low degree 
Boolean functions to resist an extension of linear attacks [3]. Remember that 
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RM{r,n) is identical to the set of g{X) such that deg(g) < r, and the covering 
radius of RM{r,n) is the maximum distance between f{X) and RM{r,n). That 
is, 

p(r,n) = ma,xd{f{X), RM{r,n)). 
f(x) 

Further, f{X) should be t-resilient to be secure against fast correlation attacks 
in stream ciphers. 

In this section, we introduce a new definition of covering radius of RM (r, n) 
from this point of view. We define t-resilient covering radius of RM (r, n), denoted 
by p{t,r,n), as the maximum distance between a t-resilient function f{X) and 
RM{r, n), where the maximum is taken over all t-resilient functions f{X). That 
is, 

p{t,r,n) = max d{f{X),RM{r,n)). 

i-resilient f{X) 

Note that p{t,r,n) = 0ifn — t— l<r. This follows immediately from 
Siegenthalar’s inequality on resilient functions [7] . 

We then derive lower bounds and upper bounds on p{t,r,n). 



4.2 Lower Bounds on p(t, r, n) 

In this subsection, we derive lower bounds on p(t,r,n). 



Theorem 4.1. 



p{t,r,n) > 



2p(r, n — 1) if t = 0 

2p\t — 1, r, n — 1) ift>l 



Proof. (1) t = 0. 
is. 



Suppose that p(r, n— 1) is achieved by f'{xi , . . 
d{f, RM{r, n - 1)) = p{r, n - 1) . 



Xn-i). That 



Let f{xi, . . . ,Xn) = f{xi, . . . ,Xn-i) © x„. Then it is easy to see that 
/(xi, . . . ,Xn) is balanced. Therefore, f{X) is a 0-resilient function. Further, 



p{t,r,n) > d{f,RM{r,n)) 

= d{f' , RM {r,n — 1)) + d{f', RM (r, n — 1)) 

= 2p(r,n- 1) 

(2) t > 1. Suppose that p{t—l, r, n— 1) is achieved by a (t— l)-resilient function 
/'(xi, . . . ,x„_i). That is, 

d{f , RM{r,n — 1)) = p{t — l,r,n — 1) . 

Let /(xi,...,x„) = /'(xi, . . . , x„_i) © x„. Then it is easy to see that 
/(xi,...,x„) is a t-resilient function. The rest of the proof is similar to 
the above. 

□ 
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Corollary 4.1. p{t, r, n) > 2*+^p(r, n — t— 1) . 



Theorem 4.2. Suppose that there exists f{xi, . . . ,x„) such that 

d{f, RM{r, n)) > k 

and 

f{xi , . . . ,x„) = /i(a:i, . . .,Xm) © f 2 {xi, ■ • ■ ,a;„) 
for some f\ and f 2 , where 1 < m < n — 1, 2 < I < n — 1. Let 

t = min(n — m — 1,1 — 2). 



Then 

Proof. Let 



p{t, r + 1, n + 1) > k. 



J hi {X\ , , Xri) — fl {xi , ... , Xm) © ^m+1 © * * * © Xn 

[ h2(xi, . . . ,X„) = © • • • © Xi_i © /2(xi, . . . ,X„) 

It is easy to see that hi(X) is {n — m— l)-resilient and h 2 {X) is (^ — 2)-resilient. 
Then define 

h{X,Xn+i) = hi{X) (B Xn +1 ■ (hi{X) (B h 2 {X)) , 
where X = (xi, . . . ,x„). 

We first show that h is t-resilient. For Xn+i = 0, 

h{X,0) = hi{X) 



which is {n — m — l)-resilient. For Xn+i = 1, 

h{X,l)=h2{X) 

which is {I — 2)-resilient. Therefore, h{X, Xn+i) is t-resilient, where t = min(n — 
m — 1,1 — 2). 

We next prove that d{h, RM{r+l,n+ 1)) > k. Choose g{X, Xn+i) such that 
deg((/) < r + 1 and 

d{h,g) = d{h,RM{r + l,n+ 1)) . 



From Proposition 3.1, g is written as 

g{X,Xn+l) = gi{X) © Xn+i ■ g2{X) 

for some gi G RM{r + l,n) and g 2 G RM{r,n). Then from Proposition 3.2, 

d{h,g) = d(ft.,5)U„+i=o + 

= d{hi,gi) +d{h 2 ,gi ©52) 

= d{hi,gi) + d{hi © / 12 , © 32 ) 

> d{hi,gi) + d{hi 0 h 2 ,g 2 ) ~ w{hi 0 gi) 

= d{hi 0 h2,g2) 
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Let 1{X) = 0 • • • © xi-i © Xm+i © • • • © Xn- Then 

d{h,g) > d{hi © /i2,ff2) 

= d{fi © /2 © 1,92) 

= d{fi © / 2 , 92 © 0 
> d{f,RM{r,n)) 

because 92 S RM (r, n) and 92® I & RM (r, n) . Hence 

d{h, RM{r + 1, n + 1)) = d{h, g) 

> d{f, RM{r,n)) 

> k 

□ 



Corollary 4.2. /5(0,3, 7) > 18. 

Proof. Let 

f(xi, . . . ,xe) = (X 1 X 2 X 3 © X 1 X 4 X 5 ) © (X 2 X 3 X 6 © X 2 X 4 Xe © X 3 X 5 Xe) ■ 

Then it is known that [6] 

d(f,RM(2,6)) = 18 . 

Let r = 2, n = 6 , m = 5 and / = 2 in Theorem 4.2. Then we obtain this 
corollary. □ 



Corollary 4.3. Suppose that n = 4k + s, where 0 < s < 3 and k > 1. Let 
t = 2k — 1. Then 



p{t, 2,n + 1) > 



2” ^—22 ^ if n = even 
2"“^— 2 “ 2 “ if n = odd 



Proof. For n = even, let 



f{xi, .. .,Xn) = X1X2 © X3X4 © • • • © Xn-lXn . 



Then it is known that 



d(/,i?M(l,n)) =2"-1-27 -i 
(/ is a bent function). In Theorem 4.2, let 

f fl(xi, . . .,X2k) = X1X2 © • • • © X2k-lX2k, 

\ ■ 5 ^n) — X2k+lX2k+‘^ © * * * © Xji—iXn 

Then m = 2k and I = 2k + \. Hence 



t = min(n — 2k — 1,2k + 1 — 2) 
= min(4A: + s — 2k — 1,2k — 1) 
= 2k-l 



because s > 0. 
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For n = odd, let 

f{xi, .. ,,Xn) = X 1 X 2 © X 3 X 4 © • • • © Xn- 2 Xn-l ■ 

Then for any g{x\, . . . , x„) such that deg((/) < 1, 

d{f, 9 ) = d{f, g) U„=o + d{f, 5 ) U„=i 

> d{f, RM{1, n - 1)) + d{f, RM{1, n - 1)) 

= 2 ( 2 ^~^ - 2^“^^ 

= 2”“^ - 2^ 

Hence 

d(/,i?M(l,n))>2”-i-2^ . 

Finally similarly to n = even, we have t = 2k — 1. 

Therefore, this corollary holds from Theorem 4.2. □ 

4.3 Upper Bounds on p{t, r, n) 

In this subsection, we derive upper bounds on p{t,r,n). 

Theorem 4.3. For t > 1, 

p{t, r, n) < p{t — 1, r, n — 1) + p{r — 1, n — 1) . 

Proof. Any f{xi , . . . , x„) and g{xi , . . . , x„) are written as 

f f{xi, ...,X„)= fl{xi, . . .,Xn-l) © Xn ‘ f 2 (xi, . . -,Xn-l), 

\ g{xi, ...,Xn)=gi{xi,..., Xn-l) © • §2 (a^l , • ■ • , Xn-l)- 

Then 

d{f,g) = d(/,g)U„=o + d(/,g)U„=i 

= d{fi,gi) + d{fi ® f 2 , gi® 92 ) 

= d{fi,gi) + d{fi © f 2 ® 91 , 92 ) 

Now let / be any t-resilient function such that 

d{f,RM{r,n))=p{t,r,n). (1) 

Choose 9 i such that deg((/i) < r and 

d{fi, 9 i) = d{fi,RM{r,n- 1)) 

arbitrarily. Choose 92 such that deg((/ 2 ) < r — 1 and 

d{fi © /2 © 51 . 52 ) = d(/i © f 2 ® gi,RM{r - l,n - 1)) 



arbitrarily. Then 
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(1) . deg(g) < r. Therefore, 

d(f,g) > d{f,RM{r,n)) = p{t,r,n) . 

(2) . /i is {t — l)-resilient. Therefore, 

d{fi,gi) = d{fi,RM{r,n- 1)) < p{t - l,r,n- 1) . 

(3) . It is easy to see 

d{fi® f 2 ® gi,g 2 ) < p{r - l,n- 1) . 

Therefore, 

p{t,r,n) < d{f,g) 

= d{fi,gi) +d{fi ©/2 © 51 . 52 ) 

< p{t - l,r,n- 1) + p{r - l,n- 1) . 



□ 

Lemma 4.1. Suppose that f{X) is balanced and deg( 5 (X)) < n — 1, where 
X = {x \ , . . . , Xn) ■ Then 

d{f,g) = 0 mod 2 . 

Proof. Note that 

d{f,g) = w{f) + w{g) - 2w{f X 5 ) . 

Since deg( 5 ) < n — 1, it holds that w{g) = 0 mod 2. Therefore, it holds that 
d{f,g) = 0 mod 2 . □ 

We finally generalize Proposition 2.1 [5] and Proposition 2.2 [5]. 

Theorem 4.4. Let 1 < r < n — 2 and 0<t<n — r — 2. If f{xi , . . . , x„) is a 
t-resilient function, then 

d(/, i?M(r, n)) = 0 mod 2 ^?^'*'^ . 

Proof. We show that 

d(/(X), 5 (X)) = 0 mod 2 L?J+i ( 2 ) 

for any g{X) such that deg( 5 ) < r, where X = (a;i, . . . ,a;„). Let a{g,r) be the 
number of degree r terms Xi^ ■ ■ ■ Xi^ involved in 5 . 

Base step on r. If r = 1, then the theorem follows from Proposition 2.1. 
Inductive step on r. Assume that (2) is true for r = xq. We will show 
that it is true for r = rg + 1 . 

Base step on a{g,rQ + 1). If 0 ( 5 , rg + 1) = 0, then g{xi, . . . ,Xn) G 
RM{rQ,n). By an induction hypothesis on r, we have 

d{f,g) = 0 mod 2^^^+^ 

= 0 mod 2 ^©+!^'''^ . 
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Inductive step on a( 5 , ro + 1). Assume that (2) is true for a(g, ro+1) < oq- 
We show that (2) is true for a{g,ro + 1) = oq + 1- Without loss of generality, 
we assume that 

g{xi, . . . ,Xn) = Xi - ■ ■ Xr^ + I © 5*(xi, . . . , CC„) 

for some g* such that a{g*,ro + 1) = oq- 
Define 

{ /&i...6ro+l “ f {bi, . . . , brg + 1, XrQ+2i ■ ■ ■ 1 Xn) 

9b,...brg + i 9*{bl, ■■■, brg + l,Xrg+2, 

dbl...brg + l = d{fb-l...brg + l,9bl...b,.g+i) 

Then we have 

/ d(f, g*) = do,,,o + • • • + di .,.10 + = 2^’'o+i^~'’ k 

\ d{f, g) = do,,,o + • • • + 

for some integer k by an induction hypothesis on a{g, ro + 1). Therefore we have 
d{f,g) = - 2di„.i . 

From our condition on the parameters, it holds that 

t < n — (ro + 1) — 2 . 

Therefore, we have 

Tl — (tq © 1) © t + 2 © [ — J + 1 

?-o + 1 

Hence 

2n-(ro+i) =0mod2L©ViJ+i . 

Further, from the induction hypothesis on a{g,ro + 1), we have 

I ‘-(’•n + i) 111 

= 0 mod 2^ ’’o+i ^ 

= 0 mod 2 ^F)+tJ . 

since is a (t—(ro + l) (-resilient function and a{gl i, tq + I) < ao- Therefore, 

2di..,i = 0 mod 2^©+!^''’^ . 

Finally, putting all things together, we have 

d(/,g) = 0mod2L7J+i 

for any g such that deg(( 7 ) < r. Therefore, this Theorem holds. □ 
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Corollary 4.4. If r < n — t — 2, then 

p{t,r,n) < p{r,n) — (^p{r,n) mod . 

Proof. It is clear that p(f,r,n) < p{r,n). Then apply Theorem 4.4 □ 



Corollary 4.5. Let Y = p{t — 1, r, n — 1) + p{r — 1, n — 1). Then 

/5(t, r, n) < F — mod . 

Proof. From Theorem 4.3 and Theorem 4.4. □ 

5 Numerical Result 

We present a table of numerical values of p(t, r, n) which are obtained from our 
bounds and the previous bounds. The entry a-j3 means that a < p{t, r, n) < j3. 




(a) is obtained from Theorem 4.1, (b) is obtained from Proposition 2.2, (c) is 
obtained from Theorem 4.2, (d) is obtained from Corollary 4.2, (e) is obtained 
from Corollary 4.4, (/) is obtained from Corollary 4.3, and (g) is obtained from 
Proposition 2.1. Unmarked values are obtained from p(r,n). 
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Abstract. In this paper we show some efficient and unconditionally se- 
cure oblivious transfer reductions. Our main tool is a class of functions 
that generalizes the Zig-zag functions, introduced by Brassard, Crepeau, 
and Santha in [6]. We show necessary and sufficient conditions for the 
existence of such generalized functions, and some characterizations in 
terms of well known combinatorial structures. Moreover, we point out 
an interesting relation between these functions and ramp secret sharing 
schemes where each share is a single bit. 

Keywords: Oblivious Transfer, Zig-zag Functions, Ramp Schemes. 



1 Introduction 

The oblivious transfer is a well known cryptographic primitive. Introduced by 
Rabin in [24], and subsequently defined in different forms in [16,5], it has found 
many applications in cryptographic studies and protocol design. One of the most 
common forms in which the oblivious transfer is used is the following^ [5]: Let 
S, the Sender, and let TZ, the Receiver, be two players. Assume that S holds n 
secrets of £ bits and TZ is interested in one of them, say the f-th one. An oblivious 
transfer protocol enables TZ to receive the z-th secret out of the n S holds in such 
a way that 

- S does not know which of the n secrets TZ has received 

- TZ does not receive any information on the other secrets S holds. 

We will refer to such a protocol as to an (”)-OT^. All the oblivious transfer 
definitions [24,16,5] were shown to be equivalent [12,4,13,6]. Moreover, Kilian, 
in [21], showed that the oblivious transfer is complete; in other words, it can 
be used to construct any other cryptographic protocol. Due to the importance 
of the oblivious transfer many papers [6,12,11,13,14,22,23], assuming that an 

^ Recently, it has been pointed out that Wiesner independently developed a similar 
concept in 1970, unpublished until [27]. 
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(")-OT^ is available, have been focusing on designing protocols that realize an 
(^)-OT^, where N > n and L > £, using in an efficient way the given (”)-OT^. 
Such kind of protocols are usually referred to as oblivious transfer reductions. 

In [14], unconditionally secure oblivious transfer reductions have been stud- 
ied. Lower bounds on the number of times an oblivious transfer protocol 

must be called to realize an one, as well as on the number of random 

bits needed to implement such a reduction, have been proven. The bounds were 
shown to be tight when the parameter L = £. Unfortunately, when L > £, the 
trivial extension of the described protocol leaks some information. Actually, a 
cheating receiver is able to obtain pieces of different secrets. 

In this paper we focus our attention on unconditionally secure reductions of 
(^)-OT^ to (")-OT^ . We show how to modify the protocol proposed in [14] in 
order to avoid information leakage. To this aim, we investigate the properties 
of a class of functions that generalizes the Zig-zag function class introduced by 
Brassard, Crepeau, and Santha in [6] in order to reduce in an unconditionally 
secure way (^)-OT^ to (^)-OT^. Using these generalized Zig-zag functions we 
set up an unconditionally secure oblivious transfer reduction of (^)-OT^ to 
(")-OT^, which is optimal up to a small multiplicative constant with respect to 
the number of invocations of the smaller oblivious transfer needed to implement 
such a reduction [14]. 

Zig-zag functions have been deeply studied in the last years. The authors 
of [6] showed that linear Zig-zag functions are equivalent to a special class of 
codes, the self-intersecting codes [9]. Moreover, they described several efficient 
methods to construct these codes. On the other hand, Stinson, in [25], found 
bounds and combinatorial characterizations both for linear and for non-linear 
Zig-zag functions. Applying techniques developed in [25,26], we show necessary 
and sufficient conditions for the existence of generalized Zig-zag functions, and 
some characterizations in terms of orthogonal arrays and large set of orthogonal 
arrays as well. 

Then, we show that the reduction presented in [14] can be viewed as a two- 
stage process, and using a ramp secret sharing scheme [1] in the first stage, we 
set up a reduction of (^)-OT^ to (")-OT^, which is optimal with respect to the 
number of invocations of the available (”)-OT^, up to a factor 2. 

Finally, we point out an interesting relation between generalized Zig-zags and 
ramp secret sharing schemes where the size of each share is exactly one bit. 



2 Oblivious Transfer 

The following definitions were given by Brassard, Crepeau, and Santha in [6] 
and were used, in a slightly simplified form^ in [14]. We refer the reader to [6] 
for more details. 

^ The goal of that paper was to find out lower bounds and the awareness condition 
does not influence them in any way 
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Assume that S and TZ hold two programs, S and R respectively, which specify 
the computations to be performed by the players to achieve (^)-OT^. These 
programs encapsulate, as black box, ideal (")-OT^. Hence, during the execution, 
S and TZ are able to carry out many times unconditionally secure (")-OT^. In 
order to model dishonest behaviours, where one of the player tries to obtain 
unauthorized information from the other, we assume that a cheating S (resp. 
TZ) holds a modified version of the program, denoted by S (resp. R). 

Let [Pq, Pi](a)(6) be the random variable representing the output obtained 
by S and TZ when they execute together their own programs, Pq held by S and Pi 
held by TZ, with private inputs a and b, respectively. Moreover, let [Pq, Pi]*(a)(6) 
be the random variable that describes the total information acquired during 
the execution of the protocol on input a and b, and let [Pq, Pi]5(a)(6) (resp. 
[Pq, Pi]k(o)(^)) be the random variable obtained by restricting [Pq, Pi]*(a)(6) 
to S (resp. to TZ). These restrictions are the view each player has while running 
the protocol. 

Finally, let W be the set of all length N sequences of P-bit secrets, and, 
for any w G W, let Wi be the z-th secret of the sequence. Denoting by W the 
random variable that represents the choice of an element in W, and by T the 
random variable representing the choice of an index zinT = {!,..., A}, we can 
define the conditions that an (^)-OT^ oblivious transfer protocol must satisfy 
as follows: 

Definition 1. The pair of programs [5, P] is correct for {^^-OT^ if for each 
w & W and for each i gT 

P([S,R](w)(z)) yf (e,zc*)) = 0, (1) 

and, for any program S, there exists a probabilistic program Sim such that, for 
each w & W and i €T 

([S, R](zc)(z)|P accepts ) = ([S, R](5'zm('u;))(z)|P accepts ). (2) 

Notice that condition (1) means that two honest players always complete 
successfully the execution of the protocol. More precisely, TZ receives wt, the 
secret in which he is interested, while S receives nothing. The output pair (e, Wi), 
where e denotes the empty string, describes this situation. On the other hand, 
condition (2), referred to as the awareness condition, means that, when TZ does 
not abort, a dishonest S cannot induce on TZ's output a distribution that he could 
not induce by changing the input (Sim(w)) and being honest. As explained in 
[6], this condition is necessary for future uses of the output of the protocol. 

Assuming that both S and TZ are aware of the joint probability distribution 
T’w.t on W and T, the probability with which S chooses the secrets in W and TZ 
chooses an index i G T, and using the mutual information^ between two random 
variables, the privacy property of ('^)-OT^ can be defined as follows: 

® The reader is referred to Appendix A for the definition and some basic properties of 
the concept of mutual information. 
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Definition 2. The pair of programs [S', i?] is private for if for each 

w G W and i G T, for any program S 

/(T;[S,R]JH(*)|W)=0, (3) 

while, for any program R, there exists a random variable T = /(T) such that 

/(W;[S,R]^HW|T,W^) = 0. (4) 

These two conditions ensure that a dishonest S does not gain information 
about TZ’s index; and a dishonest TZ infers at most one secret among the ones 
held by S. 

3 Unconditionally Secnre Reductions 

In the literature can be found many unconditionally secure reductions of more 
“complex” OT to “simpler” ones [11,12,4,14]. The efficiency of such reductions 
has been careful analyzed in [14]. Therein, the authors considered two types of 
reductions: reductions for strong (^)-OT^, where condition (4) of Definition 2 
holds, and reductions for weak (^)-OT^, where condition (4) is substituted by 
the following condition: 

for any program R and i G T, it holds that 

/(W;[S,R]^H(z))<L. (5) 

Roughly speaking, in a weak reduction, a dishonest TZ can gain partial infor- 
mation about several secrets, but at most L bits overall. Besides, they termed 
natural reductions the reductions where the receiver TZ sends no messages to 
the sender S. This automatically implies that condition (3) of Definition 2 is 
satisfied. Using the above terminology, they showed the following lower bounds 
on the number a of invocations the ('^)-OT'^ protocol must do of the ideal (")- 
OT^ sub-protocol, and on the number of random bits required to implement the 
(iV)-OTi, 

Theorem 1. [1^] Any information-theoretical secure reduction of weak (^)- 
OT^ to must have a> j ■ 

Theorem 2. [I 4 .] In any information-theoretic natural reduction of weak {^)- 
OT^ to the sender must flip at least random bits. 

When L = £, the bounds are tight both for the strong and the weak case, since 
they showed a protocol realizing (^)-OT^ where N > n which makes exactly 
invocations of the (")-OT^ and flips exactly random bits [14]. 

However, for the case L > I, they gave a protocol (see Table 1), which is optimal 
with respect to condition (5), but which does not meet condition (4). The idea is 
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Table 1. Basic protocol for a weak reduction 



Protocol weakly reducing (with L > t) to 

Assume that i\L. 

- Let w = wi, , wn be the length N sequence of secrets 5 holds. For each 
i = 1, . . . , A^, Wi is a string of L bits. 

L 

- Split the strings into j pieces. More precisely, let Wi = , where, 

wj e {0, lY , for each j = 1, . . . , j. 

- For j = 1, . . . , j, execute an oblivious transfer of the j-th piece of 

w = wi, . . . ,wn . In other words, compute 

on (w(,...,w^) 

where the is the reduction of to described in [14]. 



simply to split each of the N secret strings in L/i pieces of f bits, and to run the 
available (^)-OT^, optimal with respect to the use of the (")-OT^ black box, 
exactly j times. 

An honest TZ always obtains the secret in which he is interested in, recovering 
the “right” pieces at each execution. On the other hand, a cheating TZ is able to 
recover j pieces of possibly different secrets among w = Wi, . . . ,Wff. We would 
like to modify this basic construction in order to achieve condition (4) without 
losing too much in efficiency. 

Brassard, Crepeau, and Santha solved a similar problem in [6]. They stud- 
ied how to reduce (^)-OT^ to (^)-OT^ in an information theoretic secure way. 
Starting from the observation that trivial serial executions of £ (^)-OT^ oblivi- 
ous transfer, one for each bit of the two secret strings Wq and Wi, didn’t work, 
they pursued the idea of finding a function / where, given xq and xi such that 
/(xo) = wq and f{xi) = wi, from two disjoint subsets of bits of xq and xi it is 
possible to gain information on at most one of wq and wi . Using such a (public) 
function, the reduction would have been simple to implement (see Table 2). 



Table 2. Protocol for two secrets of i bits 



Protocol strongly reducing to 



- S picks random xo, xi € {0, 1}” such that f(xo) = wo and f(xi) = wi 

- For i = 1, . . . , n, iS performs a (^)-OT^ on the pair (xg, x\) 

- TZ recovers wg or w\_ by computing f{xg) or f{xi). 
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The property of / ensures that an honest receiver is always able to recover 
one of the secrets, while a dishonest receiver can obtain information on at most 
one of the secrets. They called such functions Zig-zag functions. 

Notice that we have to solve a very close problem: in our scenario, a cheating 
receiver is able to obtain partial information about many secrets. Our aim is 
to find out a class of functions where disjoint subsets of strings xi,a; 2 ,... give 
information about at most one of the secrets wi,W 2 ,-.. 

4 Generalized Zig-zag Functions 

Let X = GF{q), and let X” = {(a;i, . . . ,x„) : G X, for 1 < t < n}. More- 
over, for each / = {ii, . . . ,i|/|} C {!,... ,n}, denote by = (xi^, . . . the 

subsequence of x G X" indexed by I. Finally, let be the set of all possible 
subsequences for a given I. 

A function is unbiased with respect to a subset / if the knowledge of the 
value of x^ does not give any information about f{x). More formally, we have 
the following definition 

Definition 3. Suppose that f : X” ^ X"*, where n > m. Let I C {!,... ,n}. 
We say that f is unbiased with respect to I if, for all possible choices of x^ G 
X^, and for every {yi, . . . ,ym) G X™, there are exactly choices for 

. . . , x„) = (yi, . . . , ym). 

This concept has been introduced in [6]. Actually, the form in which it is 
stated here is the same as [25]. Since we are going to follow the same approach 
applied in [25] to study the properties of linear and nonlinear Zig-zag functions, 
we prefer this definition. The definition of Zig-zag functions relies on the unbiased 
property. 

Definition 4. A function f : X” ^ X™ is said to be a Zig-zag function if, 
for every I C {!,..., n}, / is unbiased with respect to at least one of I and 

We would like some “generalized” Zig-zag property, holding for different dis- 
joint subsets of indices. Roughly speaking, a generalized Zig-zag function should 
be unbiased with respect to at least s — 1 of the subsets Ii , . . . , /« into which 
{1, . . . , n} is partitioned (for all possible partitions). More formally, we can state 
the following 

Definition 5. Let s be an integer such that 2 < s < n. A function f : X” ^ X™ 
is said to be an s-Zig-zag function if, for every set of s subsets /i,...,/s C 
{!,..., n}, such that = {!,..., n}, and Lj C\ Ij = % if i ^ j, f is unbiased 
with respect to at least s — 1 of Li, . . . ,Lg. 

In an s-Zig-zag function, if TZ collects information about s xfs, for some s, 
then he can get information on at most one Wi. If the above property is satisfied 
for every 2 < s < n, then we say that / is fully Zig-zag (see Appendix B for 
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an example of such a function). Fully Zig-zag functions enable us to apply the 
same approach developed in [6] in order to substitute the real secrets Wi with 
some pre-images Xi of Wi. The generalized property of the function ensures the 
privacy of the transfer. 

Note: The functions / : X” ^ X™ we are looking for must be efficient to 
compute. Moreover, there must exist an efficient procedure to compute a random 
pre-image x € for each y G 

4.1 Zig-zag and Fully Zig-zag Functions 

We briefly review some definitions and known results about Zig-zag. A Zig-zag 
(resp. s-Zig-zag, fully Zig-zag) function is said to be linear ii there exists aximxn 
matrix M with entries from GF{q) such that f{x) = xM'^ for all x € GF(qffi. 

The following results have been shown in [25] and are recalled here since they 
will be used in the following subsection. The next lemma shows an upper bound 
on the size of the set of index / with respect to a function can be unbiased. 

Lemma 1. [35/ If f : A” ^ A™ is unbiased with respect to I, then |/| <n — m. 

As a consequence, it is possible to show a lower bound on the size n of the 
domain of the function, given the size m of the codomain. 

Lemma 2. [ 25 ] If f : A” ^ A™ is a Zig-zag function, then n > 2m — 1. 

The following theorem establishes that a Zig-zag function is unbiased with 
respect to all the subsets of size m — 1. 

Theorem 3. [ 25 ] If f : A” ^ A™ is a Zig-zag function, then f is unbiased 
with respect to I for all I such that |/| = m — 1. 

Moreover, notice that it is not difficult to prove the following result 

Lemma 3. If f : A” ^ A™ is unbiased with respect to I, then f is unbiased 
with respect to all I Cl. 

Using the above results, we can prove our main result of this section: an 
equivalence between certain fully Zig-zag functions and Zig-zag functions. 

Theorem 4. Let n > 2m — 1. Then f : A” ^ A™ is a fully Zig-zag function if 
and only if f is a Zig-zag function. 

Proof. We give the proof for n = 2m — 1. The if part is straightforward. Indeed, 
if / is fully Zig-zag, then for each partition Ii, ... ,Ig of {1, . . . , n} f is unbiased 
with respect to at least s — 1 subsets out of the s in the partition. Hence, it is 
unbiased with respect to at least 1 subset out of the 2 for any possible bipartition 
of {1, ... , n}. Therefore, / is Zig-zag. 

Assume now that / is Zig-zag. Hence, by definition, for each I C {!,..., n}, 
/ is unbiased with respect to at least one of I and {1, . . . , n} \ /. 

Let Ii, . . . , Is he a, partition of {1, ... , n}. We can consider two cases. 
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a) There exists a subset li of the partition such that \Ii\ > n — m. Consider 

this subset. Since / is Zig-zag, by Lemma 1, / is unbiased with respect to 
{1, . . . , n} \ li- But {1, . . . , n} \ /j = Hence, applying Lemma 3, we 

can conclude that / is unbiased with respect to all Ij, for j ^ i. 

b) For each i = 1, . . . , s, |/j| < n — m. Notice that, since n = 2m — 1, 

\Ii\ < n — m \Ii\ < 2m — 1 — m |Ji| < to — 1. 

Since / is a Zig-zag function, applying Theorem 3, we can say that / is 
unbiased with respect to all li : \Ii\ = m — 1. Therefore, by Lemma 3, we 
can conclude that / is unbiased with respect to all of /i, . . . , 

Therefore, / is fully Zig-zag. □ 

The proof for n > 2m — 1 is similar. Therefore, we can conclude saying that 
Zig-zag and fully Zig-zag definitions define the same class of functions. Therefore, 
the known constructions for Zig-zag functions enable us to improve the protocol 
described in Table 1 by substituting the secrets with the pre-images of a Zig- 
zag functions, as done in the protocol described in Table 2 for two secrets. A 
complete description of our protocol can be found in Table 3. Moreover, since 
both in [6] and in [25], has been shown that for each to there exist functions 
/ : A” ^ A™, where n = 0{m) and the asymptotic notation hides a small 
constant, the modified protocol is still efficient and optimal with respect to the 
bound obtained in [14] up to a small multiplicative constant 



Table 3. General protocol, depending on /. 



Protocol strongly reducing to 

Let / : be a fully Zig-zag function such that £\P. 

- S picks random xo,xi, . . . ,xn-i £ {0, 1}^ such that, for i = 0, . . . , A — 1, 

f{Xi) = Wi. 

- S performs the protocol described in Table 1, using xo,xi, . . . ,xn-i instead 
of the real secrets wo, ■ ■ ■ , wn-i- 

- TZ recovers Xi, and computes Wi = f{xi). 



4.2 On the Existence of s-Zig-zags 

A question coming up to mind now is the following: Zig-zag functions are equiva- 
lent to fully Zig-zag functions. But these functions, according to Lemma 2, exist 

^ After the submission of this extended abstract to the conference, we found out that 
Dodis and Micali, working on the journal version of the paper presented at Eurocrypt 
’99, have independently obtained the same reduction, which will appear in the full 
version of their paper. 
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only if n > 2m — 1. Do s-Zig-zag functions exist when n < 2m — 1? The example 
reported in Appendix C shows that the answer is again affirmative. It is inter- 
esting to investigate some necessary and sufficient conditions for the existence 
of such generalized functions. The following lemma extends Lemma 2: 

Lemma 4. If an s- Zig-zag function f : X" ^ X™ exists, then 

^ f 2m — s -I- 2, if n and s are both odd or both even 

Ti ^ \ 

~ [2m — s -1-1, otherwise. 

Proof. Notice that, by definition, / must be unbiased with respect to at least 
s — 1 subsets of each possible s-partition. It is not difficult to check that the 
worst case we have to consider is when a partition has s — 2 subsets of size 1 
and two subsets of essentially the same size. Therefore, / must be unbiased with 
respect to at least one of the two “big” subsets. Hence, applying Lemma 1, it 
follows that 



I n - (s - 2) 

L <n-m. 



(6) 



The result follows by simple algebra. 



□ 



An interesting relation between s-Zig-zag and t-Zig-zag, where t > s, is stated 
by the following lemma, whose proof can be obtained essentially noticing that a 
t-partition is a refinement of an s-partition. 

Lemma 5. If f : X" ^ X™ is s-Zig-zag, then f is t-Zig-zag for every s <t < 
n. 



4.3 A Combinatorial Characterization 

Let t be an integer such that 1 < t < fc and v > 2. An orthogonal array 
OA\(t, k, v) is a Av* x k array A of u symbols, such that within any t columns of 
A, every possible t-tuple of symbols occurs in exactly A rows of A. An orthogonal 
array is simple if it does not contain two identical rows. A large set of orthogonal 
arrays OA\{t, k, v), denoted LOA\{t, k, v), is a set of v^~*/X simple OA\{t, k, v), 
such that every possible /c-tuple occurs as a row in exactly one of the orthogonal 
arrays in the set (see [20] for the theory and applications of these structures). 

Theorem 5. If f : X" ^ X™ is an s-Zig-zag function where n and s have 
different parity, and m > [^\ -\- then f is unbiased with respect to all the 

subsets of size J ■ 

Proof. Notice that, our assumptions imply By definition, 

/ is unbiased with respect to at least s— 1 subsets of each s-partition of {1, ... , n}. 
Suppose there exists a subset A such that \Ii\ = J ^i^h respect to / is 
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biased. Then, it would be possible to define an s-partition having s — 2 subsets 
of size 1, the subset /*, and a subset R having size 



Since / is biased with respect to li, then / must be unbiased with respect to 
R. This is possible only if 



, _n — (s — 2) _ 

|i?| = r <n-m 



J.71 — (s — 2) ^ 
m<n- [ 



Since [ 



x-(g-2)-| _ 



] = 1"^] — the above inequality is satisfied only if m < 



Lf J + m > and, hence, we have a contradiction. □ 

The following theorem establishes a necessary and sufficient condition for the 
existence of certain s-Zig-zag functions. 



Theorem 6. An s-Zig-zag function f : X" ^ X™, where n and s have different 
parity, and m > exists if and only if a large set of orthogonal arrays 

J > '?) with X = ^ exists. 

Proof. The necessity of the condition derives from Theorem 5, analyzing the 
arrays containing the pre-images of /, as done in [25]. The sufficiency can be 
proved as follows: label each of the g"* arrays of the large set with a different 
element of y € X™. Denote such array with Ay. Then, define a function / : 
Xn x^ as 

f{xi,...,Xn) = y {xi,...,Xn) & Ay. 

The properties of the arrays and the condition m > [|J -I- assure that / 

is s-Zig-zag. □ 



On the other hand, using the same proof technique, it is possible to show a 
sufficient condition for the existence of an s-Zig-zag for any n and 2 < s < n. 
More precisely, we can state the following 

Theorem 7. If a large set of orthogonal arrays ^2 ^ = 

(j" ™ 1 2 J exists, then an s-Zig-zag function exists. 



5 Towards a General Reduction 

The protocol described before can be conceptually divided in two phases: a first 
phase in which Xi is split into several pieces and TZ needs all the pieces to retrieve 
Xi] and a second phase where, once having obtained Xi, TZ recovers the secret 
by computing y^ = f{xi) for some function /. Since each piece gives partial 
knowledge of Xi, f needs to hide the value of yi according to the definition of 
a correct and private reduction (i.e., the Zig-zag property). In this section, we 
show that using in the first phase an appropriate ramp secret sharing scheme [1] 
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(see Appendix D for a brief review of the definition and some basic properties) to 
share Xi then, in the second phase the function / needs weaker requirements than 
the Zig-zag property. In this case, the pieces that TZ recovers from each transfer 
are not substrings of the value Xi he needs to compute the real secret yi = f{xi), 
but shares that he has to combine according to the given ramp scheme in order 
to recover Xi. 

Actually, notice that the splitting of the strings can be seen as a sharing 
according to a (0, f)-RS, where p is |si| and £ is the size of each share/piece. 
The questions therefore are: is it possible to design an overall better protocol, 
using in the first phase some non trivial ramp scheme to share Xi. Does there 
exist a trade-off between what we pay in the first phase and what we pay in 
the second phase? Using a generic (U, ^ 2 , what properties does / need 
to satisfy in order to hide yi from partial knowledge of Xi as required by our 
problem? It is not difficult to check that the condition / needs is the following. 

Definition 6. A function f : X ^ Y realizes an unconditionally secure obliv- 
ious transfer reduction if and only if, for each set of shares {x\, . . . ,x„} for a 
secret x G X generated by a given (fi,t 2 ,ri)-RS, for every sequence of subsets 
h, ■ ■ ■ ,Is U {!,..., n}, such that Uif = {!,..., n}, and f C\ Ij = 0 if i y£ j, it 
holds that 

H{Y\Xi,) = H{Y) 

for at least s — 1 of I\, . . . , Ig- 

The definition means that at most one subset of shares can give information 
about f{x). 

It is easy to see that, when the ramp secret sharing scheme used in the 
first phase of the protocol is the trivial (0,p,p)-RS (shares/pieces of one bit), 
Definition 6 is equivalent to fully Zig-zag functions. 

An Almost Optimal Reduction. Using a (|^,n,n)-RS it is immediate to see that, 
to acquire information on xi, the adversary needs at least f + 1 shares. Hence, 
recovering partial information on one secret rules out the possibility of recov- 
ering partial information on another secret. Notice that with such a scheme, if 
each secret has size p and £ divides p, the bound on the size of the shares (see 
Appendix D) implies n > ^ (number of invocations of the given ('^)-OT^). An 
implementation meeting the bound for several values of p and £ can be set up 
using, for example, the protocol described in [17]. In this case the function / 
used in the second phase can be simply the identity function! 



6 Ramp Secret Sharing Schemes with Shares of One Bit 

Fully Zig-zag, s-Zig-zag and Zig-zag functions give rise to ramp secret sharing 
schemes with shares of one bit. The idea is the following: the dealer, given one 
of these functions, say / : X” ^ X™, chooses a secret y G X'" and computes a 
random pre-image x G f~^{y). Then, he distributes the secret among the set of 
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n participants giving, as a share, a single bit of the pre-image x to each of them. 
It is immediate to see that 

- some subsets of participants do not gain any information about the secret, 
even if they pool together their shares. These subsets are the subsets of 
{1, . . . , n} with respect to the function / is unbiased. 

- some subsets of participants are able to recover partial information about 
the secret. These are the subsets of {1, . . . , n} with respect to / is biased 

- all the participants are able to recover the whole secret. 

The idea of such constructions was recently described in [8] (see Remark 9) as 
an application of £-AONT transforms. In that construction, however, the dealer 
distributes among the participants the bits of the image of the secret while we 
distribute the bits of a pre-image of the secret. 

7 Conclusions 

In this paper we have shown how to achieve efficient unconditionally secure re- 
ductions of (^)-OT-^ to (")-OT^, proving that Zig-zag functions can be used to 
reduce (^)-OT^ to (")-OT^ for each N > n and L > £. Finally, we have studied 
a generalization of these functions, identifying a combinatorial characterization 
and a relation with ramp schemes with shares of one bit. Some interesting ques- 
tions arise from this study. To name a few: 

— The constructions presented before are almost optimal but do not meet the 
bounds of Theorems 1 and 2 by equality. Hence, the question of how to reach 
(if it is possible) these bounds is still open. 

— Do cryptographic applications of s-Zig-zag exist? We have pointed out the 
interesting relation with efficient ramp schemes, where each share is a single 
bit. Is it possible to say more? 

— Linear Zig-zag are equivalent to self-intersecting codes. Is there any charac- 
terization in terms of codes for s-Zig-zag functions? And what about some 
efficient constructions! Is it possible, along the same line of [6], to set up 
any deterministic or probabilistic method? 
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A Information Theory Elements 

This appendix briefly recalls some elements of information theory (the reader is 
referred to [10] for details). 

Let X be a random variable taking values on a set X according to a proba- 
bility distribution {Tx(a^)}a;eJf • The entropy of X, denoted by H(/K), is defined 
as 

H{X) = -J2Px{x)logPx{x), 

xeX 

where the logarithm is to the base 2. The entropy satisfies 

0<it(X)<log|X|, 

where H(X.) = 0 if and only if there exists xq G X such that Pr(X = a;o) = 1; 
whereas, H/X.) = log jXj if and only if Pr(X = x) = 1/|X|, for all x G X. The 
entropy of a random variable is usually interpreted as 

— a measure of the equidistribution of the random variable 

— a measure of the amount of information given on average by the random 
variable 

Given two random variables X and Y taking values on sets X and Y, re- 
spectively, according to the joint probability distribution {PyiY{x,y)}xex,yeY 
on their cartesian product, the conditional entropy iJ(X|Y) is defined as 

h{x\y) = -J2J2 -fV (y) ^x| Y (a;|i/) log Px| Y (a; |y) • 

y&Y x&X 



It is easy to see that 



H{X\Y) > 0. 
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with equality if and only if X is a function of Y . The conditional entropy is a 
measure of the amount of information that X still has, once given Y. 

The mutual information between X and Y is given by 
I{X;Y) = H{X) - H{X\Y), 
and it enjoys the following properties, 

/(X; Y) = /(Y; X), and /(X; Y) > 0. 

The mutual information is a measure of the common information between X 
and Y. 



B A Fully Zig-zag Function 



In this section, we show an example of a fully Zig-zag function. Let X = GF{2), 
and let / : X® ^ be the function defined by f{x) = xM’^ where 



M = 



10 110 0 
110 0 10 
0 110 0 1 



To prove that / is fully Zig-zag it is necessary to show that, for any 1 < s < 6, 
for each partition of {1, . . . , 6} into s parts, / is unbiased with respect to at least 
s — 1 of them. An easy proof can be obtained using the following theorem, which 
can be found in [25]. 

Theorem 8. Let M he a generating matrix for an [n, m] q-ary code, C, and let 
H he a parity-check matrix for C. The function f{x) = xM'^ is unbiased with 
respect to I C {1, . . . ,n} if and only if the columns of H indexed hy I are linearly 
independent. 



The parity-check matrix H for the generating matrix M is 



H = 



10 0 110 
0 10 0 11 
0 0 110 1 



Applying the above theorem, it is not difficult to see that / is unbiased with 
respect to 

a) any subset of size 1. 

b) any subset of size 2. 

c) any subset of size 3, except {1, 2, 5}, {1, 3, 4}, {2, 3, 6}, and {4,5,6}. 

Therefore, for any 2 < s < 6, and for any s-partition, / is unbiased with 
respect to at least s — 1 subsets of the s subsets. 
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C An Example of an s-Zig-zag 

In this Appendix we show an example of a 3-Zig-zag function (where n < 2m— 1). 
Let X = GF{2), and let / : X^ be the function defined by /(x) = 

where 



M = 



10 0 1 
0 10 1 
0 0 11 



In this case, the parity-check matrix H for the generating matrix M is simply 

iL = [1 1 1 1] 

Applying Theorem 8, it is easy to see that / is unbiased with respect to each 
subset of size 1. Since any 3-partition contains 2 subsets of size 1 and a subset 
of size 2, it follows that / is unbiased with respect to exactly 2 subsets. 

Hence, s-Zig-zag functions can exist where Zig-zag functions and fully Zig-zag 
functions cannot exist. 



D Ramp Secret Sharing Schemes 

A ramp secret sharing schemes ((ti, ^ 2 , for short) is a protocol by means 
of which a dealer distributes a secret s among a set of n participants V in such a 
way that subsets of V of size greater than or equal to t 2 can reconstruct the value 
of s, any subset of V of size less than or equal to ti cannot determine anything 
about the value of the secret, while a subset of size t\ <t <t 2 can recover some 
information about the secret [1]. Using information theory, the three properties 
of a (linear) {ti,t 2 ,n)-RS can be stated as follows. Assuming that P denotes 
both a subset of participants and the set of shares these participants receive 
from the dealer to share a secret s € S, and denoting the corresponding random 
variables in bold, it holds 

— Any subset of participants of size less than or equal to U has no informa- 
tion on the secret value: Formally, for each subset P G V of size |P| < U, 

H{S\P) = H{S). 

— Any subset of participants of size t\ < |P| < t 2 has some information on 

the secret value: Formally, for each subset P G V oi size t\ < |P| < t 2 , 

i?(S|P) = ^Fl(S). 

— Any subset of participants of size greater than t 2 can compute the whole 
secret: Formally, for each subset P GV of size |P| > t 2 , iF(S|P) = 0. 

In a (ti, t 2 j ?^)-RS, the size of each share must be greater than or equal to 
IS (see [7,19]). 
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Abstract. We show that there is a very straightforward closed algebraic 
formula for the Rijndael block cipher. This formula is highly structured 
and far simpler then algebraic formulations of any other block cipher we 
know. The security of Rijndael depends on a new and untested hardness 
assumption: it is computationally infeasible to solve equations of this 
type. The lack of research on this new assumption raises concerns over 
the wisdom of using Rijndael for security-critical applications. 



1 Introduction 

Rijndael has been selected by NIST to become the AES. In this paper we look 
at the algebraic structure in Rijndael. After RC6, Rijndael is the most elegant 
of the AES finalists. It turns out that this elegant structure also results in an 
elegant algebraic representation of the Rijndael cipher. 

We assume that the reader is familiar with Rijndael. We will concentrate on 
the version with 128-bit block size and 128-bit keys, and occasionally mention the 
versions with larger key sizes. Unless otherwise noted all formulae and equations 
will be in the GF(2®) field used by Rijndael. 

2 Algebraic Formulae for Rijndael 

In [DR98, section 8.5] the Rijndael designers note that the S-box can be written 
as an equation of the form 



S{x) = + 

d—0 



for certain constants wq,. . . ,ws- 

^ Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed 
Martin Company, for the United States Department of Energy under Contract DE- 
AC04-94AL85000. 
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The first simplification that we make is to get rid of the constant ws in that 
formula. In a normal round the output of four S-boxes is multiplied by the MDS 
matrix and then four key bytes are added to the result. As the key addition and 
the MDS matrix are linear, we can replace the ws constant in the S-box by the 
addition of a suitable constant to the key bytes [MROO] . In the last round there 
is no MDS matrix but there is still a key addition, so the same trick works. This 
gives us the following formula for the S-box 

S{x) = 

and we have to keep in mind that we now work with a modified key schedule 
where a suitable constant is added to each expanded key byte. 

The next simplification is to rewrite the equation as 

s{x) = y^^wdx~'^'' 

d=0 

which is nearly equivalent as = 1 for all x except a; = 0. For the remainder of 
the paper we introduce the convention that a/0 := 0 for any value a in GF(2®). 
This makes the new equation equivalent to the previous one.^ 

The form of this equation can be explained from the structure of the S-box. 
The S-box consists of an inversion in GF(2®) with 0 mapped to 0, followed by 
a bit-linear function, followed by the addition of a constant. As noted earlier, 
we can move this last constant into the key schedule, so we will ignore it. Any 
bit-linear function can be expressed in GF(2®) by a polynomial where all expo- 
nents are powers of two. The easiest way to see this is to observe that squaring 
in GF(2®) is a bit-linear operation. After all, (a + b)^ = a^ + b^ in GF(2®). There- 
fore any polynomial whose exponents are powers of two implements a bit-linear 
operation. None of these polynomials implements the same function, and the 
number of polynomials of this form equals the number of bit-linear functions. 
Therefore, any bit-linear function can be written as a polynomial with exponents 
that are powers of two. 

2.1 One-Round Equation 

We use the notation from [FKL+00] to discuss the internal values of a Rijndael 
encryption. Let be the byte at position (i,j) at the input of round r. As 
usual, state values in Rijndael are represented as a 4 x 4 square of bytes with 
the coordinates running from 0 to 3. For convenience we will assume that all 

(r) (r) 

coordinates are reduced modulo 4 so that for example Og 4 = Oq g . 

^ Handling the case a: = 0 correctly might not even be important, depending on 
the way we use these equations later. A single Rijndael encryption uses 160 S-box 
lookups. For random plaintexts there is a less than 50% chance that the case a: = 0 
will occur during the encryption. So even if we do not handle the case a: = 0 well, 
the result still applies to more than half the plaintext/ciphertext pairs. 
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The first step in a normal round is to apply the S-box to each byte of the 
state in the ByteSub step. We get 



•w = si«<:>] = 









7 

E 

dr — 0 



Wd, 






KD 



(r) 

where s- j is the state after ByteSub. The next step is the ShiftRow operation 
which we can write as 



kj = SiE = E 



Jr) 

dr- — 0 

The third step in each round is the MixColumn. We can write this as 

3 



m. 



0 _ Ar) 



er^O 



where the Vij are the coefficients of the MDS matrix. Simple substitution now 
gives us 



(r) 

Aj 



E WdA^eA+j) 

Er — O dr — 0 

Cr — 0 dr—0 



for some suitable constants Wij^k- The final step of the round is the key addition, 
and results in the input to the next round. 



(r+l) (r) 



Jr) 

3 



= Aj + E E 



Jr) 



\- 2 “ 



r+j> 



6r=0 dr = 0 



where is the round key of round r at position (i, j). We now have a fairly 
simple algebraic expression for a single round of Rijndael. We can write this 
formula in a couple of interesting ways. 



47^^ = + E Wi,e,.dAAY+A~ 



(r+l) _ , (r) 



Sr^S 

dr^V 

31 









/r=0 

31 

- JJ V w- . ^-2^fr/^ 

~ Y,j + 2^ 

/r=0 



( 1 ) 

(2) 

(3) 
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Equation (1) is a more compact rewrite of the one we already had. We define 
£ := {0,...,3} and V := {0,...,7} to get the same ranges. Equation (2) is 
derived by setting := Se^ + dr- The Wij are suitable constants. Note that 
we do not need to reduce fr modulo 8 when using it as an exponent as this is 
done automatically. In GF(2®) we have that for all k and all x, = x^ 

The exponent 2^^ can thus be taken modulo 255, which makes the exponent 2® 
be equivalent to 2°. In other words, only the {fr mod 8) part of fr can affect 
the result and we do not need to take the modulo ourselves. Equation (3) is 
derived in a similar manner by setting fr = Adr + e^, and requires a suitable 
rearrangements of the constants Wij. We find (1) the most elegant and will use 
that in the rest of the paper. However, one of the other formulae could also have 
been used with similar results. 

Finally there is one more interesting way to rewrite equation (1). 



,G+i) _ A 






= fcv + 






(a 

drGV 



(r) 



,er+j) 



The reason that this is interesting becomes clear when we start to consider the 
formula for two or more rounds. Then is replaced by a formula with 

several terms, and it in turn is raised to an even power. As the field we are 
working in has characteristic 2 we can use the Freshman’s Dream: (a + 6)^ = 
0 ^ + 6^. This generalises to exponents that are powers of two, and thus it allows 
the exponent to be applied to each term individually instead of to the sum 
of terms. If we ever want to write out the full expression without the use of 
summation symbols this prevents the creation of many cross-product terms and 
thus keeps the size of the expression under control. 



2.2 Multiple-Round Equations 

Expressions for multiple rounds of Rijndael are easily derived by substitution. 
For simplicity we choose an actual value for r. For two rounds of Rijndael we 
get 

.(3) _ J,(2) , W'i.e2.<i2 



= 45 + E 

d2^V 



/^(l) . + y- 

rf2eX> I 62 . 62 - 1 -^ ^ . (1) ,2'^1 

eie£ y^ei,ei+e2+j ) 



2“2 



di^T) 



(4) 



and the three-round version is 



„(4) _ t(3) , 



E 

63 

ds&V 



' '’'63,63-I-J 



E 



'^».63,d3 

1Ce3,e2,(i2 



62 e£ 

daS'D I 62,62+63-1-5 



^e2,ei,di 



ei^S \ €1,61+62+63+^/ 

di 



2^1 



2<^3 



2^2 
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Applying the Freshman’s Dream to equation 4 gives us 



(3) 
1 ■ ■ 



J2) 



E 



d 2 &T> 1^62,62+1 



62,^2 



f’ + E 

Cl 



2^2 

^e2,ei,di 

( 1 ) 

“ei ,61+62+1 / 



in which all exponentiations are on individual terms. This formula still looks 
rather complicated, but most of the complications are not essential to the struc- 
ture of the formula. The subscripts get more complex the deeper into the re- 
cursion we go, but all subscripts are known and are independent of the key or 
plaintext. The same holds for the exponents, they are known and independent 
of the plaintext and key. We therefore introduce a somewhat sloppy notation 
which clarifies the structure. We write K for any expanded key byte, with the 
understanding that the exact position of that key byte in the key schedule is 
known to us. All constants are written as C even though they might not be all 
the same value. We replace the remaining subscripts and powers by a *. Again, 
each * stands for a value that we can compute and that is independent of the 
plaintext and key. Finally, we use the fact that we can write the inputs to the 
first round by af'j = P 4 j+i + where the piS are the plaintext bytes. All in 
all this gives us 






62 K* 
d,2ev 



C 



C 



di 



K* 



We can now write the five-round formula 



A. 






C 



Zti + E 



64e£ X* + \ 
diGV ^ 



c 



C 



ese£ X* + \ 

ds&V ^ 



S 2 e£ ir* 

d2GV 



C 






c 



K* 

Cl 

d\^T) 



( 5 ) 



Keep in mind that every K is some expanded key byte, each (7 is a known 
constant, and each * is a known exponent or subscript, but that these values 
depend on the summation variables that enclose the symbol. 

Equation 5 gives us the intermediate values in an encryption after five rounds 
as a function of the plaintext and the expanded key. It is possible to write a 
similar formula for the value after five rounds as a function of the ciphertext 
and the expanded key. The S-box is constructed from an inversion in GF(2®) 
followed by a bit-linear function. The inverse S-box is constructed from a bit- 
linear function followed by an inversion. The inverse of the MixColumn operation 
is another MixColumn operation with different constants. The inverse cipher is 
thus constructed from the same components, and leads to a formula similar 
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to equation 5. (For simplicity we ignore the fact that there is no MixColumn 
operation in the last round which makes the last round simpler than all other 
rounds.) 

As the results from equation 5 and the inverse equation must agree, we get 
a closed algebraic equation which consists of two formulae similar to equation 5. 
Alternatively we can write out the 10-round equation which will be about twice 
the size. 



2.3 Fully Expanded Equations 

If we try to write out equation 5 without summation symbols then we get a 
very large formula. Instead of a summation we simply write out 32 copies of 
the equation that we are summing, and substitute the appropriate summation 
variable values in each copy. As there are 5 summations, we end up with about 
2^® individual terms of the form Cj{K* +pt). This formula would be too large 
to include here, but it would fit in the memory of a computer. Even the full 10- 
round formula would require only 2®° terms or so, which is certainly computable 
within the workload allowed for an attack on a 128-bit cipher. The 256-bit key 
version of Rijndael has 14 rounds. The expanded equation for half the cipher 
would have about 2^® terms, and the expanded formula for the full cipher about 
2™ terms. 

3 Other Ciphers 

We know of no other ‘serious’ block cipher that has an algebraic description 
that is anywhere near as simple as the one for Rijndael. There are some general 
techniques that work for any block cipher, but these do not lead to practical 
attacks. 

For example, any block cipher can be written as a boolean circuit, and then 
translated to a set of equations with one equation per boolean gate. However, 
this results in a system of equations and not a closed algebraic formula. It is 
equivalent to rewriting the problem of finding the cipher’s key as an instance of 
SAT [Meh84], for which no efficient algorithms are known. 

If one tries to rewrite the equations into a closed formula there is an explosion 
of terms. For example, in DES each output bit of the round function depends 
on 6 input bits. The boolean expressions for the S-boxes are fairly complicated 
[KwaOO], and each input bit will be used at least 16 times on average in the full 
boolean expressions for the output bits. A fully expanded boolean formula for 
DES therefore has at least around 16^® = 2®^ terms, and due to the ‘random’ 
structure of the S-boxes this formula has no neat structures to take advantage 
of. Quite clearly this will not result in an attack that is faster than exhaustive 
search. 

Another idea is to write the formula for the cipher in conjunctive normal 
form. This results in a simple formula: the entire function can be written using 
some constants and a few summation-type operators. Of course, the underlying 
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problem formulation for an attack is still SAT. Furthermore, a direct evaluation 
of this formula is impossible. First of all, the constants cannot be determined 
without the entire plaintext/ciphertext mapping. Even if the constants were 
known, the direct evaluation would cost in the order of 2^+^ steps where b is the 
block size and k the key size. This is obviously slower than an exhaustive search 
which requires 2^ steps. 

4 An Algebraic Attack? 

The real question is of course whether we can turn these formulae for Rijndael 
into an attack. At this moment we do not have an attack on Rijndael that uses 
this algebraic representation. 

If the formula was a simple polynomial it would be trivial to solve, but this is 
nothing new. To make the formula a polynomial we would have to eliminate the 
l/x function in the S-box. If we simply throw the 1/x away it makes the entire 
cipher affine, and there are easier ways of attacking an affine cipher. Using the 
equivalence l/x = and converting the formula to a polynomial leads to a 
polynomial with too many terms to be useful. 

Another idea would be to write the formula as the quotient of two polynomi- 
als. Again, the number of terms grows very rapidly which makes this approach 
unpromising. 

We feel that an algebraic attack would have to handle equations of the form 
of equation 5 directly. The form of this equation is similar to that of continued 
fractions, and can be seen as a generalisation. There is quite a lot of knowledge 
about “solving” continued fraction, but it is unclear to us whether that can be 
applied to these formulae. This is outside the area of expertise of the authors. 
We therefore have to leave this as an open problem: is there a way of solving for 
the key bytes K in equation 5 given enough plaintext/ciphertext pairs? 

We can give a few observations. A fully expanded version of equation 5 has 
2^® terms. If we ignore the fact that some of the key bytes in the formula must be 
equal we can write it as a formula in about 2^® individual key bytes. Computing 
the same intermediate state from the ciphertext gives us another formula of 
similar size, and setting the two equal gives us an equation in about 2^® expanded 
key bytes. From a purely information-theoretical standpoint this would require 
at least 2^^ known plaintext/ciphertext pairs, but this is not a problem. The 
attack can even afford an algorithm of order O(n^) in the number of terms 
of the equation before the workload exceeds the 128-bit key size limit. Larger 
key sizes are even more advantageous to the attacker in finding an attack with 
complexity less than that of exhaustive key search. The 256-bit key version uses 
14 rounds, so each equation for half the cipher would have about 2®® terms. An 
algebraic equation solver with a workload in the order of 0{n7) in the number 
of terms might very well lead to an attack. 

If the attack were to use an expanded formula for the full cipher it would 
have about 2®® terms. Again, the required plaintext/ciphertext pairs are not a 
problem, and an algorithm that is quadratic in the number of terms is good 
enough. For 256-bit keys an 0(n®) algorithm would even be good enough. 
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Any algorithm to solve these equations can also use the fact that many of 
the expanded key bytes in the formula must be equal. After all, there are only 
176 expanded key bytes overall, and all the key bytes in the formula are chosen 
from that set. As we know exactly which key value in the formula corresponds 
to which key byte in the expanded key, we can derive these additional equations. 
The Rijndael key schedule also introduces many linear equations between the 
various expanded key bytes which might be used. 

Note that adding more rounds to Rijndael does not help as much as one would 
think. Each extra round adds a factor of 2^ to the size of the fully-expanded 
equation. Compare this to other attacks where attacking an extra round very 
often involves guessing a full round key, which corresponds to a factor of 2^^®. 

5 Conclusions 

The Rijndael cipher can be expressed in a very neat and compact algebraic for- 
mula. We know of no other cipher for which it is possible to derive an algebraic 
formula that is anywhere near as elegant. This implies that the security of Rijn- 
dael relies on a new computational hardness assumption: it is computationally 
infeasible to solve algebraic equations of this form. As this problem has not been 
studied, we do not know whether this is a reasonable assumption to make. 

This puts us in a difficult situation. We have no attack on Rijndael that uses 
these formulae, but there might very well exist techniques for handling this type 
of formula that we are unaware of, or somebody might develop them in the next 
20 years or so. This is a somewhat disingenuous argument; any cipher could be 
attacked in the future. Yet our experience teaches us that in cryptography it is 
best to be cautious. A system that uses Rijndael automatically bases its security 
on a new hardness assumption, whereas this new assumption can be avoided by 
using a different block cipher. In that light we are concerned about the use of 
Rijndael in security-critical applications. 
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Abstract. In [15], Keliher et al. present a new method for upper bound- 
ing the maximum average linear hull probability (MALHP) for SPNs, a 
value which is required to make claims about provable security against 
linear cryptanalysis. Application of this method to Rijndael (AES) yields 
an upper bound of UB — 2“^® when 7 or more rounds are approximated, 
corresponding to a lower bound on the data complexity of = 2®° (for 
a 96.7% success rate). In the current paper, we improve this upper bound 
for Rijndael by taking into consideration the distribution of linear proba- 
bility values for the (unique) Rijndael 8x8 s-box. Our new upper bound 
on the MALHP when 9 rounds are approximated is 2“®^, corresponding 
to a lower bound on the data complexity of 2®^ (again for a 96.7% suc- 
cess rate). [This is after completing 43% of the computation; however, 
we believe that values have stabilized — see Section 7.] 

Keywords: linear cryptanalysis, maximum average linear hull proba- 
bility, provable security, Rijndael, AES 



1 Introduction 

The substitution-permutation network (SPN) [9,1,12] is a fundamental block ci- 
pher architecture based on Shannon’s principles of eonfusion and diffusion [22]. 
These principles are implemented through substitution and linear transforma- 
tion (LT), respectively. Recently, SPNs have been the focus of increased atten- 
tion. This is due in part to the selection of the SPN Rijndael [6] as the U.S. 
Government Advanced Encryption Standard (AES). 

Linear cryptanalysis (LC) [18] and differential cryptanalysis (DC) [4] are 
generally considered to be the two most powerful cryptanalytic attacks on block 
ciphers. In this paper we focus on the linear cryptanalysis of SPNs. As a first 
attempt to quantify the resistance of a block cipher to LC, the expected linear 
characteristic probability (ELCP) of the best linear characteristic often is eval- 
uated. However, Nyberg [21] showed that the use of linear characteristics can 
underestimate the success of LC. To guarantee provable security, a block cipher 



S. Vaudenay and A. Youssef (Eds.): SAC 2001, LNCS 2259, pp. 112-128, 2001. 
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designer needs to consider linear hulls instead of linear characteristics, and the 
maximum average linear hull probability (MALHP) instead of the ELCP of the 
best linear characteristic. 

Since the MALHP is difficult, if not infeasible, to compute exactly, researchers 
have adopted the approach of upper bounding it [2,13,15]. In [15], Keliher et al. 
present a new general method for upper bounding the MALHP for SPNs. They 
apply their method to Rijndael, obtaining an upper bound on the MALHP 
of UB = 2“^® when 7 or more rounds are approximated, corresponding to a 
lower bound on the data complexity of = 2®° (for a 96.7% success rate — see 
Table 1).^ 

The current paper is based on the following observation: the general method 
of Keliher et al. in [15] can potentially be improved by incorporating specific 
information about the distribution of linear probability (LP) values for the SPN 
s-boxes. Due to the fact that Rijndael has only one (repeated) s-box, and because 
of the structure of this s-box, this observation applies readily to Rijndael, and 
allows us to improve the upper bound on the MALHP to UB = 2“®^ when 
9 rounds are approximated, for a lower bound on the data complexity of 2®^ 
(again for a 96.7% success rate). (This value is based on completion of 43% 
of the computation, although we believe that the values have stabilized — see 
Section 7. 

Conventions 

The Hamming weight of a binary vector x is written wt(x). If Z is a random 
variable, E [Z] denotes the expected value of Z. And we use f[A to indicate the 
number of elements in the set A. 



2 Substitution-Permutation Networks 

A block cipher is a bijective mapping from N bits to N bits {N is the block size) 
parameterized by a bitstring called a key, denoted k. Common block sizes are 
64 and 128 bits (we consider Rijndael with a block size of 128 bits). The input 
to a block cipher is called a plaintext, and the output is called a ciphertext. 

An SPN encrypts a plaintext through a series of R simpler encryption steps 
called rounds. (Rijndael with a key size of 128 bits consists of 10 rounds.) The 
input to round r (1 < r < i?) is first bitwise XOR’d with an fV-bit subkey, de- 
noted k’’, which is typically derived from the key, k, via a separate key-scheduling 
algorithm. The substitution stage then partitions the resulting vector into M sub- 
blocks of size n {N = Mn), which become the inputs to a row of bijective nx n 
substitution boxes {s-boxes) — bijective mappings from {0, 1}" to {0, 1}". Finally, 
the permutation stage applies an invertible linear transformation (LT) to the 
output of the s-boxes (classically, a bitwise permutation). Often the permuta- 
tion stage is omitted from the last round. A final subkey, k^+^, is XOR’d with 

^ In [15], the value was incorrectly given as 2^® due to an error in the table 
corresponding to Table 1. See Remark 2 for clarification. 
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the output of round R to form the ciphertext. Figure 1 depicts an example SPN 
with = 16, M = n = 4, and R = 3. 

We assume the most general situation for the key, namely, that k is an 
independent key [3], a concatenation of {R+ 1) subkeys chosen independently 
from the uniform distribution on {0, 1}^ — symbolically, k = (k^,k^, . . . 

We use JC to denote the set of all independent keys. 



round 1 — 



round 2 — 



round 3 — 





s-boxes 



Fig. 1. SPN with N = 16, M = n = 4, i? = 3 



3 Linear Probability 

In this section, and in Section 4, we make use of some of the treatment and 
notation from Vaudenay [23] . 

Definition 1. Suppose B : {0, 1}"^ ^ {0, l}'^ is a bijective mapping. Let a, b G 
{0, 1}'^ be fixed, and let X G {0, 1}'^ be a uniformly distributed random variable. 
The linear probability LP(a, b) is defined as 

LP(a, b) (2 • Probx {a • X = b • S(X)} - 1)^ . (1) 

If B is parameterized by a key, k, we write LP(a, b;k), and the expected LP 
(ELP) is defined as 



ELP{a, b) = E [LP{a, b; K)j , 

where K zs a random variable uniformly distributed over the space of keys. 

Note that LP values lie in the interval [0, 1]. A nonzero LP value indicates 
a correlation between the input and output of B, with a higher value indicat- 
ing a stronger correlation (in fact, LP{a,h) is the square of entry [a, b] in the 
correlation matrix ior B [5]). 
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The values a/b in Definition 1 are referred to as input/output masks. For our 
purposes, the bijective mapping B may be an s-box, a single encryption round, 
or a sequence of consecutive encryption rounds. 

The following lemma derives immediately from Parseval’s Theorem [20] . 

Lemma 1. Let B : {0, l}'^ ^ {0, l}*^ he a bijective mapping parameterized by a 
key, k, and let a, b S {0, 1}"^. Then 

LP(a,x;k)= Y LP(x,b;k) = l 
xe{o,i}‘' xe{o,i}<i 

^ ELP{a,^)= Y ELP{^,h) = l. 

xe{o,i}‘i xe{o,i}‘' 



3.1 LP Values for the Rijndael S-box 

Consider the (unique) Rijndael 8x8 s-box (see the Rijndael reference code [7]) as 
the bijective mapping B in Definition 1 . A short computation yields the following 
interesting fact. 

Lemma 2. Let the bijective mapping under consideration he the 8x8 Rijndael 
s-hox. IfaG {0, 1}® \ 0 is fixed, and b varies over {0, 1}®, then the distribution 
of values LP(a, b) is constant, and is given in the following table (pi is the 
LP value, and (pi is the number of times it occurs, for 1 < i < 9). The same 
distribution is obtained t/b S {0, 1}® \ 0 zs fixed, and a varies over {0, 1}®. 



i 


1 


2 


3 


4 


5 


6 


7 


8 


9 




to 






ii-S 


to 






ihf 


0 


4^i 


5 


16 


36 


24 


34 


40 


36 


48 


17 



4 Linear Cryptanalysis of Markov Ciphers 

It will be useful to consider linear cryptanalysis (LC) in the general context of 
Markov ciphers [17]. 



4.1 Markov Ciphers 

Let £ : {0,1}^ ^ {0,1}^ be an i?-round cipher, for which round r is given by 
the function y = er(x;k’’) (x G {0, 1}^ is the round input, and k'’ G {0, 1}'^ is 
the round-r subkey) . Then £ is a Markov cipher with respect to the XOR group 
operation (0) on {0, 1}-^ if, for 1 < r < R, and any x. Ax, Z\y e {0, 1}^, 

ProbK {cr(x; K) 0 Cr(x 0 Ax; K) = Z\y} = 

ProbK.x {e.(X; K) 0 e,(X 0 Z\x; K) = Z\y} (2) 
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(where X and K are uniformly distributed and independent). That is, the proba- 
bility over the key that a fixed input difference produces a fixed output difference 
is independent of the round input. 

It is easy to show that the SPN model we are using is a Markov cipher, as 
are certain Feistel ciphers [10], such as DES [8]. 

Remark 1. The material in the remainder of Section 4 applies to any Markov 
cipher. Although we are dealing with LC, which ostensibly does not involve 
the 0 operation, the relevance of the Markov property given in (2) is via an 
interesting connection between linear probability and differential probability (see, 
for example, equations (3) and (4) in [23]). 

4.2 Linear Cryptanalysis 

Linear cryptanalysis (LC) is a known-plaintext attack (ciphertext-only in some 
cases) introduced by Matsui [18]. The more powerful version is known as Al- 
gorithm 2 (Algorithm 1 extracts only a single subkey bit). Algorithm 2 can be 
used to extract (pieces of) the round-1 subkey, k^. Once is known, round 1 
can be stripped off, and LC can be reapplied to obtain k^, and so on. 

We do not give the details of LC here, as it is treated in many papers [18,3,14,15]. 
It suffices to say that the attacker wants to find input/output masks a, b G 
{0, 1}^ for the bijective mapping consisting of rounds 2 ... i?, for which LP{a, b; k) 
is maximal. Based on this value, the attacker can determine the number of known 
(plaintext, ciphertext) pairs, Afr (called the data complexity), required for a suc- 
cessful attack. Given an assumption about the behavior of round-1 output [18], 
Matsui shows that if 

“ LP(a,b;k)’ 

then Algorithm 2 has the success rates in Table 1, for various values of the 
constant, c. Note that this is the same as Table 3 in [18], except that the constant 
values differ by a factor of 4, since Matsui uses bias values, not LP values. 

Remark 2. The table in [15] corresponding to Table 1 has an error, in that the 
constants have not been multiplied by 4 to reflect the use of LP values. 



Notational Issues. Above, we have discussed input and output masks and 
the associated LP values for rounds 2 ... i? of an i?-round cipher. It is useful to 
consider these and other related concepts as applying to any T >2 consecutive 



Table 1. Success rates for LC Algorithm 2 



c 


8 


16 


32 


64 


Success rate 


48.6% 


78.5% 


96.7% 


99.9% 
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“core” rounds (we say that these are the rounds being approximated). For Algo- 
rithm 2 as outlined above, T = R — 1, and the “first round,” or “round 1,” is 
actually round 2 of the cipher. 

We use superscripts for individual rounds, so LP*(a, b;k‘) and ELP*{a,h) 
are LP and ELP values, respectively, for round t. On the other hand, we use t 
as a subscript to refer to values which apply to the first t rounds as a unit, so, 
for example, ELPt{a, b) is an ELP value over rounds 1 . . .t. 

4.3 Linear Characteristics 

For fixed a, b € {0, 1}^, direct computation of LPT{a,h;'k) for T core rounds 
is generally infeasible, first since it requires encrypting all iV-bit vectors through 
rounds 1 . . . T, and second because of the dependence on an unknown key. The 
latter difficulty is usually handled by working instead with the expected value 
ELPxia, b) . The data complexity of Algorithm 2 for masks a and b is now taken 
to be 

^ ELPT{a,h) ■ 

The implicit assumption is that LP 7 ’(a,b;k) is approximately equal to ELPT{a,h) 
for almost all values of k (this derives from the Hypothesis of Stochastic Equiv- 
alence in [17]). 

The problem of computational complexity is usually treated by approximat- 
ing ELPT{a,h) through the use of linear characteristics (or simply character- 
istics). A T-round characteristic is a (T -|- l)-tuple Q = (a^,a^, . . . ,a^,a^+^). 
We view a* and as input and output masks, respectively, for round t. 

Definition 2. Let O = (a^,a^, . . . ,a^,a^+^) be a T-round characteristic. The 
linear characteristic probability (LCP) and expected LCP (EL CP) of f2 are 
defined as 

T 

LCP{n;k) = J]^LP*(a‘,a*+^k*) 

T 

ELCP{C) = Y[ELP\a\a*+^). 

i=l 



4.4 Choosing the Best Characteristic 

In carrying out LC, the attacker typically runs an algorithm to find the T-round 
characteristic, 17, for which ELCP{0) is maximal; such a characteristic (not nec- 
essarily unique) is called the best characteristic [19]. If 17 = (a^,a^, . . . ,a^,a^+^), 
and if the input and output masks used in Algorithm 2 are taken to be a = a^ 
and b = a^+^, respectively, then ELPT{a,h) (used to determine Nl in (3)) is 
approximated by 



ELPT{a,h) « ELCP{Q) . 



(4) 
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The approximation in (4) has been widely used to evaluate the security of block 
ciphers against LC [12,14]. Knudsen calls a block cipher practically secure if 
the data complexity determined by this method is prohibitive [16]. However, by 
introducing the concept of linear hulls, Nyberg demonstrated that the above 
approach can underestimate the success of LC [21]. 



4.5 Linear Hulls 

Definition 3 (Nyberg). Given N-bit masks a, b, the corresponding linear hull, 
denoted ALH(a, b),^ is the set of all T-round characteristics (for the T rounds 
under consideration ) having a as the input mask for round 1 and b as the output 
mask for round T, i.e., all characteristics of the form 

f2=(a,a^a^...,a^,b). 



Theorem 1 (Nyberg). Let a, h G {0,1}^ ■ Then 

ELPT{a,h)^ ELCP{n). 

tleALH(a,b) 

It follows immediately from Theorem 1 that (4) does not hold in general, since 
ELPxia, b) is seen to be equal to a sum of terms ELCP{L2) over a (large) set of 
characteristics, and therefore, in general, the ELCP of any characteristic will be 
strictly less than the corresponding ELP value. This is referred to as the linear 
hull effect. An important consequence is that an attacker may overestimate the 
number of (plaintext, ciphertext) pairs required for a given success rate. 

Remark 3. It can be shown that the linear hull effect is significant for Rijndael, 
since, for example, the ELCP of any characteristic over T = 8 rounds is upper 
bounded by 2“^™ [6],^ but the largest ELP value has 2“^^® as a trivial lower 
bound. ^ 

The next lemma follows easily from Theorem 1 and Definition 2 (recall the 
conventions for superscripts and subscripts). 

Lemma 3. Let T >2, and let a, b € {0, 1}^. Then 

ELPT{a,h)^ ELPr-i(a,x) • ELP^(x,b) . 

xelo.if'v 

^ Nyberg [21] originally used the term approximate linear hull, hence the abbreviation 
ALH, which we retain for consistency with [15]. 

® Any 8-round characteristic, fl, has a minimum of 50 active s-boxes, and the maxi- 
mum LP value for the Rijndael s-box is 2“®, so ELCP{G) < (2“®)*° = 2“®°°. 

^ This follows by observing that Lemma 1 is contradicted if the maximum ELP value 
is less than 2“^*. 
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4.6 Maximum Average Linear Hull Probability 

An SPN is considered to be provably secure against LC if the maximum ELP, 

max ELPxia.h), (5) 

a,be{0,l}"\0 

is sufficiently small that the resulting data complexity is prohibitive for any 
conceivable attacker.® The value in (5) is also called the maximum average linear 
hull probability (MALHP). We retain this terminology for consistency with [15]. 

Since evaluation of the MALHP appears to be infeasible in general, re- 
searchers have adopted the approach of upper bounding this value [2,13,15]. 
If such an upper bound is sufficiently small, provable security can be claimed. 



5 SPN-Specific Considerations 

In the current section, we adapt certain results from Section 4 to the SPN model. 
Note that where matrix multiplication is involved, we view all vectors as column 
vectors. Also, if A4 is a matrix, A4' denotes the transpose of A4. 

Lemma 4. Consider T core SPN rounds. Let I <t <T, and a, b, k* S {0, 1}^. 
Then LP*(a, b;k*) is independent o/k‘, and therefore 

LP\a,h; k*) = ELP*{a,h). 

Proof. Follows by observing the interchangeable roles of the round input, x, and 
k‘, and from a simple change of variables x = x 0 k* when evaluating (1). 

Corollary 1. Let Q be a T-round characteristic for an SPN. Then LCP{Q) = 
ELCP{Q). 



Defiuitiou 4. Let L denote the N-bit LT of the SPN represented as a binary 
N X N matrix, i.e., t/x,y G {0,1}^ are the input and output, respectively, for 
the LT, then y = Lx. 

Lemma 5 ([5]). //be {0, 1}^ and a = L'b, then a • x = b • y for all N-bit 
inputs to the LT, x, and corresponding outputs, y (i.e., ifh is an output mask 
for the LT, then a — L'b is the (unique) corresponding input mask). 

It follows from Lemma 5 that if a* and are input and output masks 
for round t, respectively, then the resulting input and output masks for the 
substitution stage of round t are a* and b* = L'a*"*"^. Further, a* and b* determine 
input and output masks for each s-box in round t. Let the masks for S* be 

® For Algorithm 2 as described above, this must hold for T = R — 1. Since variations 
of LC can be used to attack the first and last SPN rounds simultaneously, it may 
also be important that the data complexity remain prohibitive for T = R — 2. 
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denoted a\ and b*, for \ <i < M (we number s-boxes from left to right). Then 
from Matsui’s Piling-up Lemma [18] and Lemma 4, 

M 

i=l 

From the above, any characteristic fi € ALH(a, b) determines an input and an 
output mask for each s-box in rounds 1 ... T. If this yields at least one s-box for 
which the input mask is zero and the output mask is nonzero, or vice versa, the 
linear probability associated with that s-box will trivially be 0 , and therefore 
ELCP{Q) = 0 by ( 6 ) and Definition 2. We exclude such characteristics from 
consideration via the following definition. 

Definition 5. For a, b € {0, 1}'^, let ALH(a, b)* consist of the elements 17 G 
ALH(a, b) such that for each s-box in rounds 1 . . .T, the input and output masks 
determined by 17 for that s-box are either both zero or both nonzero. 

Remark 4- In [23], the characteristics in ALH(a, b)* are called consistent. 

Definition 6 ([3]). Any T-round characteristic, 17, determines an input and 
an output mask for each s-box in rounds 1 . . . T. Those s-boxes having nonzero 
input and output masks are called active. 

Definition 7. Let v be an input or an output mask for the substitution stage of 
round t. Then the active s-boxes in round t can be determined from v (without 
knowing the corresponding output/input mask). We define 7 v to be the M-bit 
vector which encodes the pattern of active s-boxes: 7 v = 7172 . • . 7 M; where 74 = 1 
if the s-box is active, and ji = 0 otherwise, for 1 < i < M . 

Definitions ([15]). Let 7, 7 G {0, 1}^. Then 

W[-f, 7 ] =*" # {y G {0, 1}^ : 7^ = 7, 7y = 7, where x = L'y} • 

Remark 5. Informally, the value IF [ 7 , 7 ] represents the number of ways the LT 
can “connect” a pattern of active s-boxes in one round ( 7 ) to a pattern of active 
s-boxes in the next round ( 7 ). 

We now proceed to our improved method for upper bounding the MALHP for 
Rijndael. 

6 Improved Upper Bound on MALHP for Rijndael 

6.1 Technical Lemmas 

Lemma 6 ([15]). Let m > 2, and suppose sequences of 

{ . ') m 

dij be the sequences obtained by sorting 
{ci} and {di}, respectively, in nonincreasing order. Then YllLi ^ 
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Lemma 7 ([15]). Suppose and are sequences of non- 
negative values, with sorted in nonincreasing order. Suppose there exists 

fh, 1 < rh < m, such that 

(a) Ci > Ci, for 1 < i < m 

(b) Ci < Ci, for (m + 1) < i < m 

(a) X/i=l Ci < X/i=l Ci 
Then Yh=i < Yh=i Cd^*- 

6.2 Distribution of LP Values for Multiple Active S-boxes 

Definition 9. Let a G {0, 1}^^® \ 0 be a fixed input mask for the substitution 
stage of Rijndael, and let b be an output mask which varies over {0,1}^^®, with 
the restriction that ^a. = 1 y>- If A is the number of s-boxes made active (A = 
wt{'^a)), define T>a to be the set of distinct LP values produced as b varies, and 
let Da = Define {pi, pf, ■ ■ ■ , Pd a) sequence obtained by sorting 

T>a in decreasing order, and let be the number of occurrences of the value 
pf, for l<j< Da- 

Note that if A = 1, then Da = 9, and pj and are as given in Lemma 2. 
Lemma 8. For A > 2, 

I^A = {pI ■ pt~^ : 1 < s < Di, 1 < t < Da-i} , 
and for each j, 1 < j < Da = ff^A, 

= ■■ P\ ■ Pf~" =pf, 1 <s<Dd l<t< Da-i} ■ 

Proof. Follows easily from Lemma 4 and (6) . 

Definition 10. For A > 1 and 1 < J < Da, we define the partial sums 

1=1 

= E pf • ^f- 
1=1 

Also, we define Sa to be the sequence 

A A A A A A 

Pi 1 ■ -J 1 Pi J f>2 1 ■ — 1 P2 1 ; PDa ’ • • ■ ’ PDa • 

<pf terms <pf terms terms 

Remark 6. For 1 < A < M, A^^ = 1 by Lemma 1. 
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6.3 Derivation of Improved Upper Bound 

Convention: In this subsection, whenever we deal with values of the form 
ELPt{a,h) or ELP*{a,h) (1 < t < T), we omit the LT from round t. This 
is simply a technical matter that simplifies the proofs which follow. 

Let T > 2. As in [15], our approach is to compute an upper bound for each 
nonzero pattern of active s-boxes in round 1 and round T — that is, we compute 
for 7, 7 G {0, 1}^ \ 0, such that the following holds: 

UB Property for T. For all a,b G {0, 1}^ \ 0, ALPr(a,b) < C/BT[7a,7b]- 



If the UB Property for T holds, then the MALHP is upper bounded by 



max UBt\i,i]- 
77e{o,i}*^\o 



The case T = 2 is handled in Theorem 2, and the case T > 3 in Theorem 3. 

Theorem 2. Let the values C/i? 2 [ 7 , 7 ] be computed using the algorithm in Fig- 
ure 2. Then the UB Property for 2 holds. 



Proof. In this proof, “Line A” refers to the A**' line in Figure 2. Let 7, 7 G 
{0, 1}^ \ 0 be fixed, and let a, b G {0, 1}^ \ 0 such that 7a = 7 and 7b = 
7. We want to show that ELP 2 {a,h) < C/i?2[7, 7]. There are W = IU[7,7] 
ways that the LT can “connect” the / active s-boxes in round 1 to the £ active 
s-boxes in round 2. Let xi,X2, • • • ,xw be the corresponding output masks for 
the substitution stage of round 1 (and therefore the input masks for the round-1 
LT), and let yi, y2, • • • , yw be the respective output masks for the round-1 LT 
(and therefore the input masks for the substitution stage of round 2). So 7x, = 7 
and 7y. = 7, for 1 < f < IF. Let Ci = ELP^{a,:x.i) and di = ELP^{yi,h), for 
1 < i < IF. It follows from Lemma 3 that ELP 2 {a,h) = 

Without loss of generality, f < £, so Amin = / and Amax = £■ Let {ci} ({di}) 
be the sequence obtained by sorting jci} ({di}) in nonincreasing order. Then 
Si=i Cidi by Lemma 6. Let {q} ({di}) consist of the first IF terms 
of Sf (Si). Since the terms Ci (di) are elements of Sj (Si), it follows that Ci < Ci 
(di < di), for 1 < i < IF, so 



WWW 

ELP 2(^a^\)^ ^ ^ Cid^ "E ^ ] Cidi "E ^ ] Cidi . 

i=l i=l i=l 

It is not hard to see that the value C/i?2[7,7] computed in Figure 2 is exactly 
X^i^li^idi. For computational efficiency, we do not sum “element-by-element” 
(i.e., for each i), but instead take advantage of the fact that {ci} has the form 

/ / / / / / 

Pi -, f>2^--j,P2; Ps^--,P3; ■ • ■ > 

4 >( terms 4 >( terms 4 >l terms 




Upper Bound on the Maximum Average Linear Hull Probability for Rijndael 123 



1. 


For each 7 g {0, 1}**^ \ 0 


2. 


For each 7 g {0, 1}*^ \ 0 


3. 


W ^ VF[7,7] 


4. 


/ <— wt('y), t <— wt('y) 


5. 


Amin ^ f}, Amax ^ max{/, £} 


6 . 


A ^ 0, Sum ^ 0 


7. 


1 


8. 


While {h < and < W) 


9. 


Sum ^ Sum -|- NextTerm2 


10. 


h h 1 


11. 


If (h < and > W) 


12. 


Sum ^ Sum -|- NextTerm2 (W) 


13. 


t/i?2[7,7] ^ Sum 


14. 


Function NextTerm2 (Z) 


15. 


J ^ min {i : 1 < i < > zj 


16. 


A A ^ - a) - - zJ * 


17. 


A ^ A + AA 


18. 


return * AA^ 



Fig. 2. Algorithm to compute Ui? 2 [ ] 



and similarly for {di} (replace / with £). Viewing these sequences as “groups” 
of consecutive identical elements, the algorithm in Figure 2 proceeds “group-by- 
group.” The variable h is the index of the current group in {ci}. The function 
NextTerm2() identifies the corresponding elements in {di}, and computes the 
equivalent of the element-by-element product, which is added to the growing 
sum in Line 9. The situation in which {ci}}^^ is a truncated version of Sf is 
handled by the conditional statement in Lines 11-12. 



Theorem 3. Let T > 3. Assume that the values [ 7 , 7 ] have been com- 

puted for all 7, 7 G {0, 1}'^ \ 0 such that the UB Property for (T — 1) holds. Let 
the values he computed using the algorithm in Figure 3. Then the UB 

Property for T holds. 



Proof. Throughout this proof, “Line V” refers to the line in Figure 3. Let 
a, b G {0, 1}^ \ 0. It suffices to show that if 7 = 7 a in Line 1 and 7 = 7 b in 
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1 . 


For each 7 g {0, 1}**^ \ 0 


2 . 


For each 7 g {0, 1}*^ \ 0 


3. 


t ^ wt{y) 


4. 


g { 0 , 1 }^ \0:1FK, 71^0} 


5. 


Order the H = elements of as 71 , 72 , , 7 h such that 


6 . 


UBT-i['y,'yi] > /7St-i[7)72] > ■ ■ ■ > UBT-i[y,'yH] 


7. 


Uh ^ UBT-i['y,7h], for 1 < h< H 


8 . 


Wh ^ W[7h,% for 1 < h < H 


9. 


If ^ 0, A ^ 0, VFtotal ^ 0, Sum ^ 0 


10 . 


h^l 


11 . 


While {h < H) and (Uh > 0) and (<F -|- (Uh * Wh) < 1) and (A < 1) 


12 . 


IFtotal ^ IFtotal + Wh 


13. 


Sum ^ Sum -|- NextTermT (Wtotal) 


14. 


h h 1 


15. 


If {h < H) and (Uh > 0) and (^ + (Uh * Wh) > 1) and (A < 1 ) 


16. 


Wtotal ^ Wtotal + (1 - 'l')/Uh 


17. 


Sum ^ Sum -|- NextTermT (Wtotal) 


18. 


FBt[7i 7] ^ Sum 


19. 


Function NextTermT (Z) 


20 . 


J ^ min 1 j : 1 < 7 < Di, > z} 


21 . 


^ (T5 - a) - [(4>S - Z) * p5] 


22 . 


^p^q, + (Uh* Wh) 


23. 


A ^ A + AA 


24. 


return (pU"' * 



Fig. 3. Algorithm to compute UBt[ ] for T > 3 



Line 2, then the value VBtYi,^] computed in Figure 3 satisfies ELPT{a.,h) < 
UBtIj, 7 ]. Enumerate the elements of {0, l}'^ \ 0 as yi, y 2 , • • ■ , y 2 "-i- We view 
these as input masks for round T, and hence as output masks for the LT of 
round {T — 1). For each y^, let be the corresponding input mask for the LT. It 

follows from Lemma 3 that ELPT{a.,h) = ELP'r-i(a, x^) • i?LP^(yi,b). 

If 7 yi 7 ^ 7 b (= 7 )) then ELP^ (yi,b) = 0 (this follows from (6)), so we remove 
these Yi from consideration, leaving yi, y2> • • • , yL (for some L), and correspond- 
ing input masks, xi, X 2 , ■ • ■ > xl, respectively. 

Let Cj = ELPt-1 (a, Xi) and di = ELP"’" (yi,h), for 1 < t < L. Then 
ELPrisijh) = ^i^iCidi- Note that ^ < 1, < 1 by Lemma 1. Let 
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{ci} ({fii}) be the sequence obtained by sorting {cj} ({dj}) in nonincreasing 
order. Then Cjdi by Lemma 6. If {di} consists of the first L 

terms of Se {i = wt{'j) as in Line 3), then for 1 < i < L (since the di 

are elements of Si), so S 

Let Ui = UBx-i[a,iii], for 1 < t < L, and let {iii} be obtained by sorting 
{ui} in nonincreasing order. Clearly c, < iii, for 1 < t < L. Using notation from 
Lines 4-8, {iii} has the form 




Wi terms W 2 terms W 3 terms 



(7) 



If ^ I) {cij be identical to the sequence {iii}. If X^i^i > I; 

{1 < Lu < L) be minimum such that > 1, and let {ci} consist of the 

first L terms of 



lil, ii2, . . . , 




, 0 , 0 , 0 ,.... 



(8) 



It follows that Cidi by Lemma 7 (with {di} playing the role 

of {di} in the statement of the lemma). Combining inequalities gives 



L 

ELPT{3i,h) < '^Cidi . 

2=1 



(9) 



The value X^i^i (^) is exactly the upper bound computed in Figure 3. We 

argue similarly to the T = 2 case. Since {ci} and {di} are derived from sequences 
which consist of groups of consecutive identical elements (the sequence in (7) 
and Si, respectively), the algorithm operates group-by-group, not element-by- 
element. Beginning at Line 10, the variable h is the index of the current group 
in {ci} (having element value Ut and size Wh)- Function NextTermT() identifies 
the corresponding elements in {di}, and computes the equivalent of the element- 
by-element product. 

If the terms in {ci} (resp. {di}) shrink to 0 because the corresponding terms in 
(7) (resp. Si) become 0, the check {Uh > 0) (resp. (A < 1)) in Line 11 or Line 15 
will fail, and the algorithm will exit. The check + {Uh * Wh) > 1) in Line 15 
detects the case that in the derivation of {c,} from {iii} above, X)i=i > I> 
and therefore {c,} is based on the truncated sequence in (8). 



7 Computational Results 

We estimate that running the above algorithm to completion will take up to 
200,000 hours on a single Sun Ultra 5. We are currently running on about 50 
CPUs, and have completed 43% of the computation for 2 < T < 10. 
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It is worth noting that in progressing from 11% to 43% of the computation, 
there was no change in the upper bound for 2 < T < 10. Combined with our 
experience in running the algorithm of [15], for which the numbers also stabilized 
quickly, we expect that the final results will be the same as those presented below. 

In Figure 4, we plot our improved upper bound against that of [15] for 2 < 
T < 10. Note that the new bound is noticeably superior to that of [15] for 
T > A. When T = 9 rounds are being approximated, the upper bound value is 
UB = 2“®^. For a success rate of 96.7%, this corresponds to a data complexity 
of = 2®"^ (Table 1). The corresponding upper bound value from [15] is 2“^®, 
for a data complexity of 2®*^. This represents a significant improvement in the 
calculation of the provable security of Rijndael against linear cryptanalysis. 

We also plot very preliminary results for 11<T<15, in order to gain a sense 
of the behavior of the upper bound (for these values of T, we have completed only 
1.5% of the necessary computation, hence the label “Extrapolation”). Unlike the 
upper bound in [15], the new upper bound does not appear to flatten out, but 
continues a downward progression as T increases. 




Fig. 4. Improved upper bound on MALHP for Rijndael 



7.1 Presentation of Final Results 

Upon completion of computation, we will post our final results in the lACR 
Cryptology ePrint Archive (eprint.iacr.org) under the title Completion of 
Computation of Improved Upper Bound on the Maximum Average Linear Hull 
Probability for Rijndael. 





Upper Bound on the Maximum Average Linear Hull Probability for Rijndael 127 



8 Conclusion 

We have presented an improved version of the algorithm given in [15] (which com- 
putes an upper bound on the maximum average linear hull probability (MALHP) 
for SPNs) in the case of Rijndael. The improvement is achieved by taking into 
account the distribution of linear probability values for the (unique) Rijndael 
s-box. When 9 rounds of Rijndael are approximated, the new upper bound is 
2“®^, which corresponds to a lower bound on the data complexity of 2®^, for a 
96.7% success rate. (This is based on completion of 43% of the computation. 
However, we expect that the values obtained so far for 2 < T < 10 core rounds 
will remain unchanged — see Section 7.) This is a significant improvement over 
the corresponding upper bound from [15], namely 2“^®, for a data complexity 
of 2®° (also for a 96.7% success rate). The new result strengthens our confidence 
in the provable security of Rijndael against linear cryptanalysis. 
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Abstract. Cryptography and Coding Theory are closely knitted in 
many respects. Recently, the problem of Decoding Reed Solomon Codes 
(aka Polynomial Reconstruction) was suggested as an intractability as- 
sumption upon which the security of cryptographic protocols can be 
based. This has initiated a line of research that exploited the rich al- 
gebraic structure of the problem and related subproblems of which in 
the cryptographic setting. Here we give a short overview of recent works 
on the subject and the novel applications that were enabled due to this 
development. 



1 Background 

The polynomial reconstruction (PR) problem with parameters n, k,t is a, natural 
way of expressing the problem of (list-)decoding Reed Solomon Codes: given a 
set of n points over a finite field, it asks for all polynomials of degree less than 
k that “fit” into at least t of the points. Translated to the coding theoretic 
context, PR asks for all messages that agree with at least t positions of the 
received codeword, for a Reed Solomon code of rate k/n. 

Naturally, PR received a lot of attention from a “positive” side, i.e. how to 
solve it efficiently. When t > then PR has only one solution and it can be 
found with the algorithm of Berlekamp and Welch [BW86] is the error- 

correction bound of Reed-Solomon codes) . The problem has been investigated 
further for smaller values of t ([Sud97,GS98,GSR95]). These works have pointed 
to a certain threshold for the solvability of PR. Specifically, the problem appears 
to be hard if t is smaller than y/kn, (the best algorithm known, by Guruswami 
and Sudan [GS98], finds all solutions when t > \/kn). 

We note here that apart from any direct implications of efficient list-decoding 
methods in the context of coding theory, these algorithms have proved instru- 
mental in a number of computational complexity results such as the celebrated 
PGP theorem. There are numerous other works in computational complexity 
that utilize (list-)decoding techniques such as: the average-case hardness of the 
permanent [FL92,GPS99], hardness amplification [STV99], hardness of predict- 
ing witnesses for NP-predicates [KS99] etc. 
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Perhaps the most notable work which applies the “negative” side of error- 
correction decoding (i.e., its inherent hardness for certain parameters) is the 
McEliece’s cryptosystem [McE78]. Recently, in the work of Naor and Pinkas 
[NP99], the above well-studied Reed-Solomon list-decoding problem (namely: 
PR) has been looked at from a “negative” perspective, i.e. as a hard problem 
which cryptographic applications can base their security on. 

It is important to stress that from a cryptographic perspective we are not 
interested in the worst-case hardness of PR but rather on the hardness of PR 
on the average. It is easy to see that PR on the average (termed also noisy PR) 
has only one solution with very high probability (note that we consider PR in a 
large prime finite field). It is believed that the noisy PR is not easier than the 
PR. This is because given an instance of the PR it is possible to randomize the 
solution polynomial (but it is not known how to randomize the noise, only to 
k-wise randomize it). This justification was presented by [NP99] who gave the 
basic suggestion to exploit the cryptographic intractability of this problem. 

2 The Work of [NP99] 

In [NP99] the PR problem is first exploited in concrete and efficient crypto- 
graphic protocol design. They presented a useful cryptographic primitive termed 
“Oblivious Polynomial Evaluation” (OPE) that allowed the secure evaluation of 
the value of a univariate polynomial between two parties. In their protocol the 
security of the receiving party was based on closely related problem to PR (later, 
due to the investigation of [BNOO], the protocol of OPE was easily modified to 
be based directly on the PR problem [BN00,NP01]). We note here that a related 
intractability assumption appeared independently in [MRW99]. 

Various useful cryptographic applications based on OPE, were presented in 
[NP99] such as password authentication and secure list intersection computation. 
OPE proved to be a useful primitive in other settings, see e.g. [Gil99,KWHI01]. 

The assumption of [NPOl] is essentially the following: given a (random) in- 
stance of PR, the value of the (unique with high probability) solution polyno- 
mial over 0 is pseudorandom for every polynomially bounded observer. Under 
this “pseudorandomness” assumption it can be easily shown that the receiving 
party in the OPE protocol is secure. Note that this assumption appears to be 
stronger than merely assuming hardness on the average. 



3 Structural Investigation of PR 

In [KYOIb] we investigate cryptographic hardness properties of PR. The main 
theme of this work is outlined below. 

Given a supposedly computationally hard problem, it is important to identify 
reasonable related (sub)problems upon which the security of advanced crypto- 
graphic primitives such as semantically-secure encryption and pseudorandom 
functions can be based. This practice is ubiquitous in cryptography, e.g. the 
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Decision-Diffie-Hellman problem is a subproblem related to the discrete-loga- 
rithm problem upon which the semantic security of ElGamal encryption is based; 
the Quadratic Residuosity Problem is a subproblem related to Factoring (and 
modular square roots) upon which the semantic security of [GM84] is based, etc. 
In [KYOlb] a similar route is followed: first a suitable related subproblem of PR 
is identified and then advanced cryptographic primitives based on this problem 
are extracted. The problem is related to distinguishing one of the indices that 
correspond to the polynomial points in a PR-instance. Distinguishing between 
the points of the polynomial solution and the random points in a PR-instance 
appears to be naturally related to the supposed hardness of PR. The correspond- 
ing assumption is called Index-PR-Assumption (IPR). Subsequently under this 
assumption, we show 

1. A PR instance conceals its solution in a semantic level: any algorithm that 
computes a function on a new value of the polynomial-solution (which is not 
given in the input) that is distributed according to an adversarially chosen 
probability distribution has negligible advantage. 

2. The PR-Instances are pseudorandom. 

Regarding the Polynomial Reconstruction Problem itself as the assumption, 
we show that it has interesting robustness properties under the assumption of 
almost everywhere hardness. In particular, solving PR with overwhelming prob- 
ability of success is shown to be equivalent to: 

1. Gomputing a value of the solution-polynomial at a new point with non- 
negligible success for almost all PR-instances. 

2. Gomputing the least-significant-bit of a new value with non-negligible ad- 
vantage for almost all PR-instances. 

These results suggest that PR and its related subproblem are very robust 
in the cryptographic sense and seem to be suitable problems for further crypto- 
graphic exploitation. A direct application of our work is that the OPE protocol 
of [NP99,NP01] can be shown semantically secure (based on the IPR assumption 
instead). 

4 Multisample Polynomial Reconstruction 

A straightforward way to generalize PR so that additional cryptographic applica- 
tions are allowed is the following: we can associate with any PR instance a set of 
indices (called the index-set) that includes the indices of the “good” points that 
correspond to the graph of the (with high probability unique) polynomial that 
“fits into” the instance. In the Multisample Polynomial Reconstruction (MPR) 
Problem, the given instance contains a set of r (random) PR-instances with the 
same index-set. The challenge is to solve all PR-instances. 

MPR was defined in [KYOla] and further investigated in [BKYOl]. This lat- 
ter work points to a hardness threshold for the parameter r. Specifically MPR 
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appears to be hard when r is smaller than n/t. MPR has similar robustness 
properties as PR and is likewise sensitive to partial information extraction. 
These properties are investigated in [KYOlc] under the corresponding Index- 
MPR- Assumption. 

5 Cryptographic Applications 

In [KYOla] a general family of two-player games was introduced together with an 
efficient protocol construction that allowed a variety of novel applications, such as 
a deterministically correct, polylogarithmic Private Information Retrieval (PIR) 
protocol. The security of these games, that involved the composition of many 
multivariate polynomials, bilaterally contributed by the two parties, was based 
on the hardness of MPR. Other applications of this work include: secure com- 
putation of the Lists’ Intersection Predicate (a stringent version of the List In- 
tersection Problem [NP99] where the two parties want to securely check if the 
two private lists have a non-empty intersection without revealing any items) and 
Settlement Escrows and Oblivious Bargaining /Negotiations, which are protocol 
techniques that are useful in the e-commerce setting. 

In [KY0Ib,KY0Ic] PR and MPR are employed in the setting of symmetric 
encryption to produce stream/block ciphers with novel attributes including: 

— Semantic Security. 

— Error-Correcting Decryption. 

— The capability of sending messages that are superpolynomial in the security 
parameter (namely, a cryptosystem with a very short (sublinear) key size). 

— Double homomorphic encryption over the underlying finite field operations 
(with bounded number of multiplications). 

6 Conclusion 

The rich algebraic structure of Polynomial Reconstruction (PR) , its related prob- 
lem (IPR) and its multisample version (MPR), has proved valuable in the cryp- 
tographic setting. On the one hand, PR and its variants appear to be robust in 
the cryptographic sense and can be used as a basis for advanced cryptographic 
primitives (as exemplified in [KYOlb, KYOlc]). On the other hand, several inter- 
esting cryptographic protocols that take advantage of the algebraic properties 
of the problem have been introduced together with their applications in secure 
computing and e-commerce (as seen in [NP99, KYOla]). 
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Abstract. Here we provide a comparison of several projective point 
transformations of an elliptic curve defined over GF{2") and rank their 
performance. We provide strategies to achieve improved implementations 
of each. Our work shows that under certain conditions, these strategies 
can alter the ranking of these projective point arithmetic methods. 



1 Introduction 



In [9,17], Koblitz and Miller independently proposed to use elliptic curves over 
a finite field to implement cryptographic primitives. One important primitive 
is the Diffie-Hellman key exchange (the elliptic curve version of the protocol is 
called the elliptic curve Diffie-Hellman key exchange, abbreviated as ECDH). 
The underlying task is computing the scalar multiple kP of a point P, where k 
is the user’s private key. 

The focus of this paper will be with elliptic curves (EC) defined over fields 
GE(2"). For the finite field GE(2"), the standard equation or Weierstrass equa- 
tion for a non supersingular elliptic curve is: 

+ xy = + ax^ + b 

where a, 6 G GE(2”). The points P = (x,y), where x,y € GF(2”), that satisfy 
the equation, together with the point O, called the point of infinity, form an 
additive abelian group G. Here addition in G is defined by: for all P G G 

— P + 0 = P, 

— for P = {x, y) yf O, —P = {x,x + y) 

— and for all Pi = {xi,y\) , P 2 = (x 2 ,y 2 ), both not equal to the identity and 
Pi 7 ^ -P 2 , Pi + P 2 = P 3 = (xs^ys) where X 3 ,ys € GP(2") and satisfy: 



3^3 



/ m±y^Y 

\Xi + X2J 
.^2,6 



yi + V2 

Xi + X2 



+ xi+X 2 + a iiPi^P2 

if Pi = P 2 



( 1 ) 
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( 2 ;^ _l_ X 3 ) + X3 + yi if Pi 7^ P 2 
X1+X2J Z 9 ', 

-fp p 

[ + Xi H \X3 + X 3 if Pi = P 2 

V xij 

The computation of a scalar multiple of a point kP can be performed by 
expressing k in binary form k = • • • fcifco and applying the “double and 

add” method. That is, 




kP = 2(- • • 2{{2krP) + kr-iP) + •••) + koP. 

The “add” operation requires 2 field multiplications, 1 square, and 1 inverse. The 
“double” operation requires 2 field multiplications, 1 square, and 1 inverse^. 

An alternate method to computing kP is to use projective point coordinates. 
The use of projective point arithmetic on an elliptic curve, rather than the 
standard affine arithmetic is such that in projective point arithmetic one delays 
the computation of an inverse until the very end of the process of computing 
kP. By doing so one will naturally see a rise in the number of required field 
multiplications. That is, one inverse will take place during the computation of 
the key kP, whereas in the affine method, one inverse takes place for each “add” 
function invoked (and as well, for each “double” function invoked). 

The decision “affine arithmetic” vs. “projective point arithmetic” should be 
decided based on the ratio 

time to compute an inverse 
time to multiply 

The larger this ratio, the more attractive it is to implement projective point 
arithmetic. Although there exists improved methods to compute inverses [21,7], 
the computation of an inverse will take significantly more time than a multipli- 
cation. For example, our implementation of field operations in GF{2^^^), using 
generating polynomial -I- -I- x® -I- x® -I- 1, was such that the performance 

of an inverse was over 10 times the time it took to perform a multiplication. 
This would be equivalent to “add” and “ double” functions which requires ap- 
proximately 13 multiplications each. This led us to investigate projective point 
arithmetic. 

In the following, we discuss alternate methods to develop projective point 
arithmetic, in an effort to determine the optimal method. We include benchmarks 
to illustrate which is the optimal method, ranking the methods by performance. 
We propose strategies which can lead to further improvement in performance, 
and in some cases lead to different conclusions with regard to such rankings. 



2 Some Implementation Strategies 
That Improve Efficiency 

There are a number of resources that discuss efficiency improvements for elliptic 
curve implementations. Some examples of efficiency improvements include: im- 

^ These requirements reflect the most efficient “add” and the most efficient “double” . 
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proved key representations [2,24], improved field multiplication algorithms [18,6], 
the use of projective point coordinates [1,4,14], the use of a halving a point al- 
gorithm [22,13], and using the Frobenius map to improve efficiency [8,19,23]. 
Two resources that provide excellent overviews are [5,3]. For a review of ECC 
implementations in literature see [15]. In this section we provide a limited list of 
strategies. This list incorporates strategies considered in our implementations. 



2.1 Field Multiplication Using a Lookup Table 

In [6], Hasan described a method which uses lookup tables to improve perfor- 
mance of the field multiplication. The idea is to precompute all 2® possible g-bit 
multiples of a multiplicand, and place them in a lookup table. Then you compute 
the multiplication by sliding over all [^] many non overlapping g-bit windows 
of the other multiplicand. In all of our implementations, we use a four-bit win- 
dow. This precomputation is placed into a lookup table. The product is then 
computed by sliding over the [^] many non overlapping 4 bit windows of the 
other multiplicand. Therefore each call to a field multiplication creates a lookup 
table for one of the multiplicands. 



2.2 Alternate Key Representations When Computing kP 
Express k in NAF form 

To reduce the number of “additions”, one may express k in NAF (non ad- 
jacent form) form (see [3,23,2,6,5]. The NAF representation of an integer is a 
“signed” binary representation such that no two consecutive bits are nonzero. 
For example, 30= 16-1-8-1-4-1-2 = IIIIO 2 , a NAF form for 30= IOOOIO 2 , we 
use 1 to represent —1. In [2], it was shown that the expected weight of a NAF 
of length I is approximately 1/3. 

Use a windowing technique on k 

Rather than computing kP using the binary representation oi k = kr ■ ■ ■ kikg. 
One could precompute the first 6—1 multiples of P, then express k in base 6 
(see [18,12,3]). 



2.3 Koblitz Curves 

In [11], Koblitz suggested using anomalous binary curves (or Kohlitz curves), 
which possess properties that can lead to improvements in efficiency. A curve 
described by the equation y^ + xy = x^ + ax^ + b, where 6=1 and either a = 0 or 
0=1, describes a Koblitz curve. In this setting, the Frobenius map, denoted by 
T, is such that r : (x,y) 1 -^ {x^,y'^) and satisfies the equation 2 = r — r^. Using 
this equation, we express the key k in r-adic form. That is, since k G Z[r], k 
which is {ki )2 can be expressed as k = {tj)r- For example, OIO 2 = UOt- describes 
the equation 2 = r — (we use 1 to represent —1). This allows one to compute 
kP, as ^ti^P, allowing us to use a “t and add” method rather than a “double 
and add” method [11,19,8]. The efficiency improvement is accomplished because 
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we are replacing the required 2 multiplications, 1 inverse and 1 square in the 
“double” by 2 squares (the “r” of a point). Observe that in the “r and add” 
method, the time-consuming operation is the “add” . 

2.4 Other Implementation Issues 

The goal is to make relevant comparisons between different projective point rep- 
resentations. In particular, we would like to determine the most efficient repre- 
sentations and strategies for both Koblitz curves and “Random” curves. To make 
these comparisons, we will be using two curves contained in the WAP/WTLS 
list of curves [26] . One, a Koblitz curve with Weierstrass equation y'^ + xy = 

+ x'^ + 1, where the field is GF(2^®^) defined by generating polynomial 
+ x"^ + x^ + x^ + 1 (this curve has been included in many standards, and 
is identified in the WTLS standard as Curve 3). The other curve is defined by 
y“^ + xy = x^ + ax"^ + b and the underlying field is GF(2^®®) defined by the 
generating polynomial + x^ + x^ + x + 1 (this is identified in WTLS stan- 
dard by Curve 5). Further assumptions, we will express the key in a NAF form 
(t-NAF when the curve is a Koblitz curve). We will not apply any windowing 
techniques to the key. Our implementation of the field multiplication is such 
that it will create a four bit lookup table of one of the multiplicands (all possible 
four bit products of this multiplicand). Remember, the intent of this work is 
two-fold. To provide an overview of projective point methods, and secondly, to 
discuss efficiency improvements when using a projective coordinates with a field 
multiplication which uses lookup tables. The intention is to use benchmarking 
as a comparison between methods, and the effect of different strategies on these 
methods. 

All benchmarks were created on a HP 9000/782 with a 236 MHz Rise pro- 
cessor (32 bit). The table belows illustrates the performance of the basic field 
operations in GF(2^®®) with generating polynomial x^®® -I- -I- x® -I- x® -I- 1 on 

this platform. 



Operation 


Time to compute 


inverse 

multiply 

square 


6.493 microsec 
0.540 microsec 
0.046 microsec 



3 Projective Point Arithmetic 

A projective plane over a field IF, can be defined by fixing positive integers a, (3 
and creating an equivalence relation where {x,y,z) ~ {x',y',z') if {x',y' ,z') = 

^ where in hexadecimal EC parameter 
a is 072546B5435234A422E0789675F432C89435DE5242 and 
b is 00C9517D06D5240D3CFF38C74B20B6CD4D6F9DD4D9 
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(A“x, A^y, Xz) where A G JF, A 7 ^ 0. An equivalence class is called a projective 
point in iF. Each affine point {x' , y') £ T x T can be associated with the equiva- 
lence class (x', y', 1). All ordered triples (x, y, z) within this class satisfy x' = ^ 
and y' = ^- The projective plane can be thought of as the union of the affine 
plane together with all equivalence classes for which z = 0. 

Each affine point (x,y) can be mapped into the projective plane by (j) : 
(x, y) ^ (x, y, 1). This map allows us to make a natural transformation from the 
affine plane to the projective plane. Then the Image{(j>) consist of all equivalence 
classes such that z ^0. Each equivalence class (x,y,z) in the Image{(j)) can be 
mapped to the affine plane by x' = ^ and y' = ^, let us call this map ip. 
Observe then that {tp o (p)[x,y) = (x,y). 

Recall that we are considering an elliptic curve E given by y^ -I- xy = x^ -I- 
ax^ + b defined over GF(2”). The set of affine points satisfying this equation 
form an additive abelian group G, and we have denoted this addition by -I-. 
Together Equations (1) and Equations (2) describe the addition operation in G. 
(Altogether we have four equations, two to describe Xg and two to describe yg). 
The projective point addition will be denoted by -I-*. That is, for all P,Q & EC, 
there exists a ^ G (p{E), such that C = 4>{P) +♦ and = P + Q. The 

goal is to describe -I-* using the projective point coordinates (p{P) and (p{Q) so 
that a field inverse operation in GE(2") is not required. For a brief discussion 
on how to generate alternate projective point representations see the Appendix. 

In practice the concern is to compute kP, where P is a fixed affine point 
(x 2 ,y 2 )- (In fact if the intention is to compute the ECDH key, then P represents 
the other user’s public key.) This point P is transformed to the projective point 
{x 2 ,y 2 : !)• Q will represent a projective point (xi,yi,zi), and will represent the 
partial computation of kP as we parse the key k. Thus we assume that P is 
of the form P = (x 2 ,y 2 , 1), that is, Z 2 = I- This assumption is adopted for all 
projective point formulas described here (this assumption is referred to as using 
mixed coordinates). From now on we will always make this assumption, and we 
introduce all projective point arithmetic operations under this assumption. Do 
note that when adding two EC points one needs to test the cases: are the two 
points equal, is one point equal to the point at infinity, and is one point the 
negative of the other. However, our intention is to consider the computation of 
kP where P is a point in a subgroup of prime order. Thus except for the case 
when k = subgroup order -1, these cases will never arise. So when discussing the 
“add”, we omit testing for these cases. 

3.1 The Homogeneous Projective Point Representation 

The Homogeneous projective point transformation [1,16,10] is such that the rela- 
tionship between affine points (x', y') and projective point (x, y, z), where z ^ 0, 
is given by x' = | and y^ = f - For this setting, the projective point (x,y,z) 
which belongs to (p{E) must satisfy zy^ -I- zxy = x^ + azx"^ + bz^ . 

Assume Q is a projective point (xi,yi,Zi) and P is (x 2 ,y 2 ,l), further we 
assume both P and Q ^ O and that P 7 ^ —Q. There are two cases two consider: 
the “add” (when P ^ Q )and the “double” (when Q = P). li P ^ ±Q, then 
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P+*Q= (x 3 ,y 3 ,Z 3 ) 

X3 = AD 

y3 = CD + A^{Bxi+ Ayi) ( 3 ) 

2:3 = A^zi 

A = X2Z1 + xi, B = 2/2^1 +2/1) C = A + B and D = A^(A + azi) + Z\BC. In the 
case where P = Q, 2 Q = {x3, j/3, Z3) 

X3 = AB 

2/3 = xfA + B{x\ + yizi + A) ( 4 ) 

2:3 = A^ 

where A = x\Zi and B = bzf + xj. 

3.2 The Jacobian Projective Point Representation 

In [ 4 ], G. Chudnovsky and D. Chudnovsky described the Jacobian projective 
point representation. The IEEE working group PI 363 [ 25 ] is developing a public- 
key cryptography standard and has incorporated the Jacobian transformation 
as their recommended manner to perform elliptic curve arithmetic when using 
projective point representation. The Jacobian transformation is given by a;' = ^ 
and y' = ^ to perform elliptic curve addition when using projective point 
coordinates. For this projective to affine transformation, the projective point 
{x, y, z) which belongs to (j){E) must satisfy y^ + zxy = x^ + az^x^ + bz^. 

To compute P -I-* Q = (a; 3 , 2/3, 2:3), with Q = (xi,yi,zi) and P = (a;2, 2 / 2 , 1 ) 
where both P and Q ^ O: it P ^ ±Q 

Ui =xi 82 = 2/2^1 

Si = yi R = Si + S2 

U 2 = X2zl L = ZiW 

W = Ui + U2V = Rx 2 + Lyi 2/3 = TX3 + VL'^. 

In the case where P = Q, 2 Q = {X3, 2/3, 2:3) 

c= b“^" ^ U = Z3 + x"( + yiZi 

Z3 = xizl y3 = x\z3 + Uxi. ( 6 ) 

X3 = {xi + czlY 



Z3 = Lzi 

P = R -\- Z3 

X 3 = azl + TR+W^ 



( 5 ) 



Here c = i/b in GP( 2 "). The equations collectively numbered ( 5 ) are performed 
columnwise left to right, top to bottom. 

3.3 Lopez & Dahab 

In [ 14 ], Lopez and Dahab described an alternate projective point representation. 
Here the relationship between affine point {x',y') and projective point {x,y,z) 
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\s x' = ^ and y' = ^- Then all projective points (x,y,z) satisfy + xyz = 
zx^ + az'^x^ + To compute P +* Q = (xa, j/3, Z3), with Q = (xi, yi, 21) and 
P = (2^2) ?/2) 1 ) where both P and Q ^ 0 \ li P ^ ±Q 

-4 = 2/2^1 +2/1 Z 3 = C ‘^ G = X3 + 2/2^3 

P = X22i + Xi E = AC 2/3 = + z^G 

C = ZiB X3 = A^ + D + E ^ ^ 

D = B‘^{C + azf) E = X3 + X2Z3 

In the case where P = Q, 2 Q = (xa, 2 / 3 , Z3) 

2 2 

Z3 = zfxf 

2^3 = a;f + bzf (8) 

2/3 = 6zfz3 + X3(az3 + yl + bzf) 



3.4 A Comparison of the Projective Point Representations 

Table 1 describes the computational requirements for each of the projective 
point methods. Note, if the EC parameter a is “sparse” and if it has to be 
multiplied to the field element t, then the time to compute at is not equivalent 
to a field multiplication of two arbitrary elements. In many sources, for example 
[14], they disregard the number of field multiplications performed with the EC 
parameters a and b, assuming that one can choose an elliptic curve with sparse 
parameters. However, in practice one may have to implement curves which have 
been defined in standards (although it maybe possible to make a transformation 
so that one or more of the transformed EC parameters is sparse). For example 
as the WAP/WTLS standard developed, there were initially only two strong 
elliptic curves defined over GF(2”), one a Koblitz curve (Curve 3) which has 
parameters a = 6 = 1, and the other a random curve (Curve 5) where both a and 
b are not sparse. And so to disregard field multiplications with EC parameters 
is not realistic. In Table 1 we have counted all field multiplications. 



Table 1. 





no. of mult. 


no. of squares 


Homogeneous Add 


13 


1 


Double 


7 


5 


Jacobian Add 


11 


4 


Double 


5 


5 


Lopez& Dahab Add 


10 


5 


Double 


5 


5 
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3.5 Are There More Efficient Projective Point Representations? 

Of the three primary field operations in G-F(2”) needed to perform projective 
point arithmetic: add, multiply and square; the field multiply is the time con- 
suming operation. Although algebraically, a square is a field multiplication, as 
an implementation, the square can be performed very efficiently in GE(2"). For 
example, if you are using a normal basis, a square is a cyclic shift. If you are using 
a polynomial basis, then the square can be implemented by inserting O’s between 
terms (see [3,21]), then reducing. That is, if C = (Co) • ■ • , Cn-i) G GF(2”) then 
= (Co, 0, Cl, 0, ... , Cn-i, 0, Cn)- The representation of C^ uses n-|-n— 1 = 2n— 1 
terms, so a significant reduction needs to take place. But this can be achieved 
very efficiently. 

Our interest in efficient projective point representations, led us to pursue 
alternate equations in an effort to reduce the time-consuming field multiplication. 
Consider the set of equations described by the Jacobian transformation, although 
they are algebraically efficient, the equations force an increase in the number of 
multiplications by the choice of an odd power. That is, to compute C raised to an 
odd power ensures the need of a multiplication, whereas, if one needs to compute 
C raised to 2,4, 8, . . . all that is required is a series of squares. 

Consider the projective point to affine point transformation x' = -^ and y' = 

. In the Jacobian, a = 2 and P = 3. The odd power in the denominator for y' 
forces additional field multiplications. In the Homogeneous transformation, both 
a and P are 1. This is inefficient in that both of the affine equations that generate 
y' , Equation (2), include the term x'. Hence P should be greater than a. (This 
is why the Homogeneous representation has so many more field multiplications 
than the Jacobian.) It would appear that the “best” implementation of ECC 
using projective point arithmetic with fields of the form GF(2") would satisfy 
a, P and P — a are powers of 2. A solution is to have a = 2^ and P = 2-1+^ 
for non-negative integer j. This incorporates the suggestion that P > a and the 
observation that the computation of and can be done by performing a 
series of squares. Of course this selection criteria determines a family of projective 
point representations. It is trivial to show that the most efficient of this family 
of projective point representations is the case when j = 0, which is the Lopez 
& Dahab representation. Also note that the number of field multiplications^ for 
the case j = 1 is equal to the number of field multiplications for the case j = 0 
(the case j = 1 will require one more square operation than the j = 0 case). 
This observation could have been generated solely from manipulating the Lopez 
& Dahab representation. 

4 Efficiency Improvements 

Clearly Table 1 shows that the most efficient projective point representation is 
the one developed by Lopez & Dahab, next the Jacobian, and last the Homo- 
geneous projective point representation. However we have found that there can 

® Our count refers to the number of field operations required to perform an “add” and 
a “double” when you use this projective point representations. 
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be other factors that may influence performance, in addition to the number of 
held operations. Further, one must consider the type of curve that one is imple- 
menting. For example, if the curve is a Koblitz curve, then only the “add” must 
be gauged. Whereas, if the curve is a random curve then both the “double” and 
the “add” must be gauged. In this case, further factors that may play a role are 
whether the EC parameters a and/or b are sparse? 

4.1 Create a Pipeline Effect Reusing Constructed Lookup Table 

A technique used to speed up computations is to pipeline common instructions 
together. Recall that our held multiplication is utilizing lookup tables to generate 
the product. To achieve a pipeline effect we examined all products and looked 
for similar multiplicands, allowing us to share lookup table for more than one 
multiply. We applied this technique to all three projective point methods. There 
are various strategies, the first strategy we employed was to allow at most one 
lookup table to be created in RAM at a time. 

We illustrate the multiplications that would take place in the “add” of a 
Koblitz curve using the Lopez & Dahab method. Each rectangle represents a 
lookup table to be generated. 



2 l • J /2 


Zl ■ X 2 


y/~z^ ■ (®1 - 1 - ZlX 2 )^ 




Zl ■ {Xl + Z1X2) 


iVi +zlv 2 ) 



«3 • X2 


(23 • {yi + zly2)) ■ (23 • X2 + X3) 


23 • V2 




23 • (®3 -f 231/2) 





The result is that the nine required held multiplications within the “add” can 
be arranged so that only 5 lookup tables are required. A word of warning, the 
existence of common multiplicands does not imply that a lookup table reduction 
can take place, for there may be a dependency between two multiplications. We 
have implemented a pipe which allowed us to add an output with a held element 
and take it as the next input (which is why we used the same lookup table for 
Z3 • y2 and 2:3 (x3 + Z3 • 1J2) ). All our piping was restricted to allowing at most 
one held addition. 

In our implementation, we saw a dramatic improvement (in proportion to 
the previous benchmark) when we incorporated a pipelined multiplication (which 
minimizes the number of multiplication lookup tables created within the “add” ) . 
The following table highlights our benchmarks for the pipelined version of the Ja- 
cobian, Homogeneous, and Lopez & Dahab. We point out that the Homogeneous 
transformation performed slightly better than the Jacobian transformation (a re- 
verse of what one may infer from Table 1). Within the Jacobian transformation 
we found many of the multiplications were dependent upon each other which 
limited our reduction of lookup tables. 
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Table 2. For a Koblitz curve 



VERSION 


Number of 
mult. 


Compute 

kP 

nonpipeline 


Lookup 

tables 

pipeline- add 


Compute 

kP 

pipeline 


Improvement 


Lopez & Dahab 


9 


3.550ms 


5 


2.982ms 


16% 


Jacobian 


10 


3.926ms 


7 


3.525ms 


10% 


Homogeneous 


12 


4.240ms 


6 


3.486ms 


18% 



4.2 Applying the Pipelining to a Random Curve 

We then applied the technique of pipelining multiplications with a common 
multiplicand in our implementation when using a random curve. To achieve a 
minimal amount of lookup tables, the piping choices made for a random curve 
will differ with the choices made for a Koblitz curve. For example, pipelines for 
lookup tables for the “add” and “double”, respectively, using the Lopez and 
Dahab implementation is such that the “add” required 5 lookup tables and the 
“double” required 4 lookup tables. (See the appendix for an illustration of how 
to construct a minimum number of lookup tables for the add and the double of 
a Random curve). 



Table 3. For a Random curve 



VERSION 


Number of 
mult, 
add 


Number of 
mult, 
double 


Compute 

kP 

nonpipeline 


Lookup 

tables 

pipeline 

add 


Lookup 

tables 

pipeline 

double 


Compute 

kP 

pipeline 


L & D 


10 


5 


7.987ms 


5 


4 


6.908ms 


Jacobian 


11 


5 


8.276ms 


8 


5 


7.910ms 


Homog. 


13 


7 


10.445 ms 


6 


4 


8.259ms 



As expected the Lopez & Dahab method performed the best. Again we see a 
dramatic improvement in the Homogeneous method when we pipeline like mul- 
tiplicands together. Although it doesn’t in this case, out-perform the Jacobian 
method. When implementing all three methods on a Koblitz curve, the time to 
compute “r” of a point is the same for all three methods. Thus the comparison 
of the three methods for a Koblitz curve is really a comparison of the “add” 
for each. Whereas in the implementation for the Random curve, the “double” is 
computed for each bit of the key, and so this comparison illustrates how “double” 
affects performance. 

5 Other Strategies 

Although we have achieved an implementation which requires a minimum num- 
ber of lookup tables for both the “double” and “add” , there are further improve- 
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merits that can be achieved, however some may require more memory. That is, 
presently we have created a elliptic curve implementation over binary fields which 
will utilize lookup tables for field multiplications such that only one lookup table 
will exist in RAM at a time. We can improve on this implementation slightly at 
a cost of memory by allowing more than one lookup table to exist. 

Choices we made earlier were based on the requirement that only one lookup 
table existed at a given time. Thus the RAM requirement for this implementa- 
tion is the same as the RAM requirement for a nonpipelined implementation of 
the computation of kP. The pipeline was developed to meet this requirement and 
minimize the number of lookup tables generated. Suppose we relax this require- 
ment and allow multiple lookup tables to exist at a given time. Observe that a 
number of field multiplications will contain an EC parameter as a multiplicand 
(i.e. a and/or b). Consequently, we can generate lookup tables for both a and b 
during the initial stage and save them throughout the computation. Normally a 
lookup table represent all 4 bit multiples of the field element. A field element in 
GF(2^®^) requires at most 21 bytes. The size of a lookup table is at least 336 
bytes. (In practice our lookup tables were such that each multiple was expressed 
using 6 words requiring 384 bytes). Because a table for a and/or b is such that it 
is used throughout the entire kP computation, the expected number of times a 
4-bit multiple of a is used (say from the “add”) is heuristically 160- 1 ^ for each 
occurrence of a multiply of an a in an “add” and 160 • ^ for each occurrence of 
a multiply of an a in the “double” . A similar analysis is true for multiples of b. 
Thus one finds an argument for increasing the window size from 4-bit to 6-bit 
or 8-bit, consequently we generated larger lookup tables for a and b, saved them 
and used these tables throughout the computation of kP. 

Note that in the “add”, there exist products which have a multiplicand X 2 
and/or j/2, where P of the kP computation is such that P = {x 2 , 2/2)- Fortunately, 
we were not implementing any windowing. However, because we are working with 
a NAF form, we will utilize both P and —P. Consequently, there are two possible 
7/2 and one possible X 2 - Again we can generate lookup tables for X 2 and both 7/2 
of P and the 7/2 of —P during the initial stage, save them and use these tables 
throughout the computation of kP. (Do note that if one uses a 6-bit window 
in their implementation, and opts for the strategy of generating a precomputed 
lookup table for 7/2 and saving it throughout the computation, then it is required 
to compute and save 2^ — 1 lookup tables.) 

Lastly, observe that in the Lopez & Dahab method when we compute the 
“add”, a required multiplication is 2:3X3 (this is required to compute 7/3). Further 
note that a required computation in the “double” is xizi. Of course in every 
case, but the case when we are processing the last bit of the key, a “double” will 
follow an “add”. Also, in every such case the product X3Z3 in the “add” is the 
same as XiZi in the following “double”. Consequently, if we save the product 
X3Z3 from the “add” workspace and use it in the following “double”, then we 
reduce the number of field multiplications for the “double” by one. Note we are 
suggesting saving a computation from a workspace, thus if we are computing the 
“double” after a “double” we will not have this product saved and so we must 
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compute xiZ\. Consequently when we discuss the number of field multiplications 
for the “double” , we must use “expected number of field multiplications” . This 
“expected number” is of course correlated to the Hamming weight of the key. 
Perhaps a better way to incorporate this reduction of multiplications is to report 
this reduction for a multiplication as a reduction in the “add” (even though it 
occurs in the “double,” the following table reports this reduction as a reduction 
within the “add” function). 

In the table that follows, we have included the before and after benchmark 
performance of our Lopez & Dahab implementation (here the after refers to the 
use of the above strategies). 

One can opt for any or all of the RAM increasing pipelining strategies. How- 
ever, it is redundant to try to utilize all three strategies to improve performance. 
In fact for a Koblitz curve, only strategy 2 is relevant (precomputing lookup 
tables for X 2 and/or 2 / 2 )- For a Random curve, we suggest strategies 1 and 3 and 
to increase the lookup table window for parameters a and b. In the appendix 
we describe an implementation for an “add” and a “double” which utilizes only 
strategy 1. The following table describes an implementation of a Random curve 
utilizes strategies 1 and 3, and uses an increased window for the lookup tables 
for a and b. 



Random Curve using RAM increasing pipelining strategies 



VERSION 


Number of 
mult. 

add/double 


Lookup 

tables 

add/double 


Compute 

kP 

pipeline 


Compute 

kP 

improved 

pipeline 


Lopez & Dahab 


9/5 


4/3 


6.908 ms 


6.342 ms 



Thus by using the Lopez & Dahab method, pipelining, and using the RAM 
increasing pipeline strategies that we have outlined, we have reduced a scalar 
multiplication from the original time of 7.983 ms to 6.342 ms. (Note that in table 
above we have reduced the number of multiplies in an “add” by one due to the 
strategy of saving the computation of X 3 Z 3 in an “add” and using it as the xiZi 
in the following “double” . ) 

6 A Suggestion on Squaring in GF{2^) 

Let /i € GF(2™), then /x = /xq -I- ^ilX + /X 2 X^ -I- • • • -I- ^rn-ix"^~^ ■ We will also 
use (no , . . . , fJbm-i) to represent /x. Further, if /x^ = 0 for all i > j then we may 
represent /x by (/xq, . . . , ^j). Let a;™ -I- p{x) represent the generating polynomial 
of the field. 

Several sources [21,3], have observed that a square can be computed by in- 
serting O’s between terms and reducing. That is, /x^ = /xq -I- pix^ + /X 2 X^ -I- • • • -I- 
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= (mO) 0, Mij O7 M2, ■ ■ • , 0, /Tm-i, 0). which needs to be reduced to 
compute the square. The required reduction can be inefficient, for it involves 
performing a series of shifts and adds. The shifts are determined by the terms 
of the generating polynomial, and they are performed on the terms of /r^. This 
inefficiency exists because every other term of jj? is zero, thus performing an xor 
(an add in GF(2’”)) with a shift can be a waste. For example if the shift was an 
even length, we will see that we are wastefully computing an xor of a 0 with a 
0. If the shift is an odd length then we are performing an xor with a value and 
a zero, rather than simply resetting a term. 

Our suggestion involves using the odd and even “parts” of a polynomial and 
performing the shifting and adds in place. That is, performing shifts and adds 
on either the odd or even “parts” of a polynomial. 

Consider the calculation of Let A = {fio, 0, /xi, . . . , ^ m-i ) and B = (0, 
/i m-i I 0. /i m-1 I o. 0, . . . , 0, /im-i), then both A and B are of degree m— 1. Here 
A is an even polynomial and B is an odd polynomial. Note that /i^ = A + x'^B. 
So what remains is to reduce x'^B. 

We will explain our algorithm using an example. Consider the field GF(2’") 
with generating polynomial + a;® + a;® + 1 (i.e. m = 163). So 

x'^B = p{x)B = (a;^ + a;® + a;® + l)B = x~^ B + x^B + x^B + IB. 

Since odd • odd is even and odd • even is odd, we see that x^B and \B are odd 
polynomials, and x"^ B and x^B are even polynomials. However, x^B, x'^ B, and 
x^B are of degree greater than 162 (so they will require a multiplication with 
p{x) but only for a few coefficients). 

Next observe that A and B have alternating zeros, as well x®H, x~^ B, and 
x^B. We can efficiently encode this representation: encode H by ^ = (^o> Mi > • • • > 
Ps^)even = (mO) Ml) • • ■ ) y^8i)EVEN and encode B by B = . . . ,Pn-i)oDD 

= (ms 2, • ■ • ,Mi62)oDD- The subscript EVEN and ODD refer to whether the 
polynomials are even or odd. Also note that A and B have different lengths 
(although we may view them as the same length but that the coefficient of the 
leading term of ,8 is 0). Multiplication of B by x* is a right shift, introducing 
zeros on the left and possibly generating a polynomial of degree > m — 1 (i.e. 
in this case m — 1 = 162). If this latter case occurs then we will call it an 
“overflow”, the result is that these terms will need to be multiplied by p{x). 
We will use SEIi to represent this right shift caused by multiplying by x*. Then 
xB = SE[{B) = (0, ^82, MS3) • ■ • ) IJ-1&2)eVEN- x'^B = (0, MS2, MS3) ■ • ■ ) Mi6i)oDD + 
f^l 62 P{x) = SH 2 {B)+Pie 2 P{x). X^B = (0, 0, ^ 82 , M83> • ■ ■ , P 16 i)eVEN + Pl 62 Xp{x) 

= SH 3 {B) + Pie 2 xp{x), and so forth. Note x®H = (0,0,0, 0,/X82,M83) ■ • ■ ,Pi59)odd 
+ {pimx"^ + pieix'^ + pi6o)p{x) = SHq{B) + {pi62x"^ + Mieia:^ + Mi6o)p(a;), 
and = (0, 0, 0, 0, ps 2 , M83, • ■ • , Pi59)even + {pi 62 x'^ + pieix'^ + pi6o)p{x) = 
SHr{B) + (/xi62a:® + pwix^ + pi6ox)p{x). 

We can compute by computing the even part, the odd part and the over- 
flow part. 

even part = A + SE[^{B) + SE[j{B) 
odd part = B + SEIq (B) 
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We write the overflow as: 

overflow = p{x) {fJ,w 2 {x^ + a;"* + a: + 1) + + a:^) + pi6o{x + 1)) 

The overflow is not decomposed in odd and even form. 

What remains is to transfer even part and odd part to correct form. If / rep- 
resents some algorithm which takes as input even part and odd part polynomial 
and outputs the polynomial, then p? = overflow + f{even part, odd part). This 
algorithm / is very similar to the algorithm which inserts zeros between terms, 
except it will insert odd coefficients between even coefficients. 

Consider the algorithm to determine a square by the method of inserting 
zeros between terms and reducing. The reduction consists of a multiplication of 
the generating polynomial to the higher degree terms, and creates a series of 
xor (addition) operations (1 less than the number of terms in the generating 
polynomial). When performing the xor operation, one would perform the oper- 
ation on the entire multi precision integer, however every other bit is zero. Thus 
one does not need to perform the xor operation on the entire multi precision 
integer, and if one is, then they are performing twice the number of bitwise-xor 
operations then what is really needed. Our algorithm computes the square with- 
out performing the wasted bitwise-xor operations (essentially it will halve the 
number of bitwise-xor operations performed). However, in our algorithm there 
still is a need to compute the polynomial given its even and odd parts (this is 
comparable to inserting zeros between terms). 

In practice one should And that this algorithm will perform better than 
the method that inserts zeros and reduces. Although the improvement will be 
marginal for small held sizes. 

7 Conclusion 

Our work has provided a survey and comparison of several projective point 
representations. In addition we have provided strategies that lead to efficiency 
improvements, and these stratgies may alter the comparison rankings of these 
projective point representations. We have provided further strategies that one 
can implement at a cost of memory. Lastly we have illustrated how to utilize 
the odd and even “parts” of a polynomial, to compute a square in GF(2"). 
The author wishes to thank the reviewers for their valuable and constructive 
suggestions. 
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8 Appendix 

Using the strategies discussed concerning precomputing and saving lookup ta- 
bles, one has great flexibility in deriving the “add” and “double” formula. In the 
following, we provide an example of a pipelined “add” and a pipelined “double” . 
We utilize strategy 1, and require precomputed lookup tables for EC parameter 
a and EC parameter b. 



Random curve: Lookup tables - add 





Z1X2 

zi(xi -1- 21 * 2 ) 


2 ? 2/2 


21 (*1 -1- ZlX2) ■ ( 2/1 + 2 ^ 2 / 2 ) 
2 i(a;i -1- Z1X2) ■ (*1 -f 21 * 2 )^ 




Z3X2 

23(2/2 + a) 

23 • (x3 + 232/2)) 
zsa 


[ 2 i(a:i -1- 21 * 2 ) • ( 2/1 + Ziy 2 )](z 3 X 2 + X3) 



Random curve: Lookup tables - double 



ZlXl 



zfb 



{yi +xl)x 3 



(ziXi) ■ (bzf) 

(ziXi) ■ {zixi{bzf + {yi + x\)x3 



Example of a pipelined add for a Random curve 



(1) 


D 4- 


^2 


(12) 23 4 


72 

23 


(2) 


make LT(2i) 


(13) a;3 4 


— 023 NO LT 


(3) 




A 4 — Z1X2 


(14) X3 4 


— X3 + D 


(4) 




23 4 — z\{A + x\) 


(15) X3 4 


— X3 -\- A -\- B 


(5) 


B 4- 


— a- D NO LT 


(16) make LT(z3) 


(6) 


B 4- 


— B + Vi 


(17) 


X3 4 — Z3X2 


(7) 


A 4 - 


— A^ 


(18) 


B 4 232/2 


(8) 


make LT(23) 


(19) 


B ^ S{B + X 3 ) 


(9) 




D 4 — Z3B 


(20) 2/3 - 


3:3 -b 2/3 


(10) 




A 4 — Z3A 


(21) 2/3 4 


— 2/3 • 71 


(11) B. 


~ 52 


(22) 2/3 4 


— 2/3 + -B 
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Example of a pipelined double for a random curve 



(1) 


B ^ 


-4 


(8) 


- yl 


(2) 




- xl 


( 9 ) 


-C + B 


( 3 ) 


Z3 ^ 


- A-B 


(10) 7/3 ^ 


— az3 NO 


( 4 ) 


B ^ 


- 


( 11 ) V 3 ^ 


— V 3 + C 


( 5 ) 


B ^ 


- bB NO LT 


(12) V 3 ^ 


— V3 ■ X3B 


(6) 


X3 ^ 


- A2 


( 13 ) 


- Z3- B 


( 7 ) 


X3 ^ 


— x^ + b 


( 14 ) 7/3 ^ 


— V 3 + C 



As stated earlier each multiplication requires a lookup table. For those cases 
where we share lookup table we have indicated the creation of a lookup table (by 
Make LT( ) ). Otherwise, the creation of the lookup table is generated within the 
multiplication algorithm. However, in line ( 5 ) we refer to NO LT. The implication 
is that no lookup table needs to be generated here. That is, 6 is a coefficient of 
the Weierstrass equation. One can create the lookup table for b at the start of 
the computation of kP, and save it. 

8.1 How to Generate Alternate Projective Point Representations 

We continue with the notation described in section 3 . 

Often there are more than one set of equations which describe +*. Recall 
that ~ (x,y,z) implies x' = ^ and y' = Fix positive integers a 

and j 3 . Let P = {x'i,y[) and Q = {x^^y^)- Step one, replace Xi and yi found in 
Equations ( 1 ) by the ratios p- and respectively for i=l ,2 (these equation 

determine Xg). Step two, in Equations ( 2 ) replace Xi and yi by the ratios ^ and 
respectively (i=l,2) and replace x^ by the correct case of the two equations 

described in step one. The result will be, after simplification, four rational equa- 
tions in the variables xi,yi, Zi,X2,y2, Z2, one set of equations for ccg (the case 
where P ^ Q and the other case for P = Q) and another set of equations for 
T/g (one case for P Q and the other case for P = Q). To provide an example, 
choose Z3 to be the least common multiple of all four denominators (which is 
a polynomial in xi, yi, zi, CC2, j/2j ^2). Set a;3 = Z3X3, as a is a positive integer, 
the denominator will cancel out (in both cases P ^ Q and P = Q), and that x^ 
is equal to a polynomial in xi,yi,zi,X2,y2,Z2- Set 7/3 = Z32/3 and for a similar 
reason j/3 is equal to a polynomial in x\,yi, zi,X2,y2, Z2- The result is four equa- 
tions, two which describe X3 and two which describe 7/3. The pair of equations 
that describe Z3 are identical (the least common multiple) . For two projective 
points (xi,yi,zi) and {x2,y2j Z2), both not equal to the identity, we describe 
the equations that define -I-* as: (xi,yi,zi) -I-* {x2,y2,Z2) = (2:3, 7/3, Z3), where 
Z3 was chosen to be least common multiple of all four denominators, and where 
X3 = zfx3 and 7/3 = 2^7/3. If one of the points was the origin, then the sum 
would be the other point. The goal was to describe the equations so that the 
inverse did not need to be employed. The result -I-* is a binary operation on the 
Image{(j){E)) such that 'tp{(p{P) -I-* (p{Q)) = P + Q. This argument illustrates 
how to define -I-* on (j>{E). Most important, we are able to achieve the addition 
without the use of a field inverse. 
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Abstract. We propose a method for increasing the speed of scalar mul- 
tiplication on binary anomalous (Koblitz) elliptic curves. By introducing 
a generator which produces random pairs (fc, [k]P) of special shape, we 
exhibit a specific setting where the number of elliptic curve operations is 
reduced by 25% to 50% compared with the general case when k is chosen 
uniformly. This generator can be used when an ephemeral pair {k, [k]P) 
is needed by a cryptographic algorithm, and especially for Elliptic Curve 
Diffie-Hellman key exchange, ECDSA signature and El-Camal encryp- 
tion. The presented algorithm combines normal and polynomial basis 
operations to achieve optimal performance. We prove that a probabilis- 
tic signature scheme using our generator remains secure against chosen 
message attacks. 

Key words: Elliptic curve, binary anomalous curve, scalar multiplica- 
tion, accelerated signature schemes, pseudo-random generators. 



1 Introduction 

The use of the elliptic curves (EC) in cryptography was first proposed by Miller 
[8] and Koblitz [4] in 1985. Elliptic curves provide a group structure, which can be 
used to translate existing discrete logarithm-based cryptosystems. The discrete 
logarithm problem in a cyclic group G of order n with generator g refers to the 
problem of finding x given some element y = of G. The discrete logarithm 
problem over an EC seems to be much harder than in other groups such as 
the multiplicative group of a finite field, and no subexponential-time algorithm 
is known for the discrete logarithm problem in the class of non-supersingular 
EC which trace is different from zero and one. Consequently, keys can be much 
smaller in the EC context, typically about 160 bits. 

Koblitz described in [5] a family of elliptic curves featuring several attractive 
implementation properties. In particular, these curves allow very fast scalar mul- 
tiplication, i.e. fast computation of [k]P from any point P belonging to the curve. 
The original algorithm proposed by Koblitz introduced an expansion method 
based on the Frobenius map to multiply points on elliptic curves defined over 
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F2, F4, Fg and Fig. An improvement due to Meier and Staffelbach was proposed 
in [6] and later on, Solinas introduced in [19] an even faster algorithm. 

Many EC cryptographic protocols such as the Elliptic Curve Diffie-Hellman 
for key exchange [13], and the ECDSA for signature [13] require the production 
of fresh pairs {k, [k]P) consisting of a random integer k and the point [k]P. A 
straightforward way of producing such pairs is to first generate k at random and 
then compute [k]P using an efficient scalar multiplication algorithm. Another 
possiblity, introduced and analysed in [16,18,17,14,15], consists in randomly gen- 
erating k and [k]P at the same time, so that fewer elliptic curve operations are 
performed. 

In this paper we focus on Koblitz (or anomalous) elliptic curves in F2»>. By 
introducing a generator producing random pairs (k, [k]P), we are able to exhibit 
a specific setting where the number of elliptic curve additions is significantly 
reduced compared to the general case when k is chosen uniformly. The new 
algorithm combines normal and polynomial basis operations to achieve optimal 
performance. We provide a security proof for probabilistic signature schemes 
based on this generator. 

The paper is organized as follows: in section 2 we briefly recall the basic 
definitions of elliptic curves and operations over a finite field of characteristic 
two. In section 3 we recall the definition of binary anomalous (Koblitz) curves 
for which faster scalar multiplication algorithms are available. We also recall 
the specific exponentiation techniques used on this type of curves. In section 4 
we introduce the new generator of pairs (fc, [k]P). Section 6 provides a security 
proof for (fc, [fc]P)-based probabilistic signature schemes, through a fine-grained 
analysis of the distribution of probability of the generator (theorem 2), and 
using a new result on the security of probabilistic signature schemes (theorem 1). 
Finally, we propose in section 7 a choice of parameters resulting in a significant 
increase of speed compared to existing algorithms, with a proven security level. 

2 Elliptic Curves on F2T1 

2.1 Definition of an Elliptic Curve 

An elliptic curve is the set of points (x, y) which are solutions of a bivariate cubic 
equation over a field K [7]. An equation of the form: 

2/^ -I- a\xy + azy = + 02X^ -I- a^x + oe , (1) 

where Oi G K, defines an elliptic curve over K. 

In the field F2»» of characteristic 2, equation (1) can be reduced to the form: 

y"^ + xy = x^ + ax^ + b with a, 6 G F2*» . 

The set of points on an elliptic curve, together with a special point O called 
the point at infinity, has an abelian group structure and therefore an addition 
operation. The formula for this addition is provided in [13]. 
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2.2 Computing a Multiple of a Point 

The operation of adding a point P to itself d times is called scalar multiplication 
by d and denoted [d]P. Scalar multiplication is the basic operation for EC proto- 
cols. Scalar multiplication in the group of points of an elliptic curve is analog to 
the exponentiation in the multiplicative group of integers modulo a fixed integer 
P- 

Computing [d]P is usually done with the addition-subtraction method based 
on the nonadjacent form (NAF) of the integer d, which is a signed binary ex- 
pansion without two consecutive nonzero coefficients: 



i-i 

d = Y,c,T , 

i=0 

with Ci G { — 1, 0, 1} and Ci ■ ct+i = 0 for all f > 0. The NAF is said to be optimal 
because each positive integer has a unique NAF, and the NAF of d has the fewest 
nonzero coefficients of any signed binary expansion of d [2], An algorithm for 
generating the NAF of any integer in described in [9] . 

3 Anomalous Binary Curves 

3.1 Defiuitiou aud Probeuius Map 

The anomalous binary curves or Koblitz curves [5] are two curves Eq ^ind 
defined over F 2 by 

dda ■ + xy = -h ax^ -h 1 with a € {0, 1} . (2) 

We define Ea(F 2 ") as the set of points (x, y) which are solutions of (2) over F 2 «. 

Since the anomalous curves are defined over F 2 , if P = {x,y) is in Fo(F 2 n), 
then the point is also in Fa(F 2 ~). In addition, it can be checked that: 

(x^2/4) + 2(x,y) = (-l)l-“(x^y2) , (3) 

where -I- holds for the addition of points in the curve. Let r be the Frobenius 
map over F 2 » x F 2 ™ 

T{x,y) = . 

Equation (3) can be rewritten for all P G Ea(F 2 '>) as 

T^P-h [2]P= . 

This shows that the squaring map is equivalent to a multiplication by the com- 
plex number r satisfying 

r2 + 2 = (-l)i-“r , 

and we say that Ea has a complex multiplication by r [5] . Consequently, a point 
on Ea can be multiplied by any element of the ring Z[r] = {x-\-y-T\x,yG Z}. 
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3.2 Faster Scalar Multiplication 

The advantage of using the multiplication by r is that squaring is very fast 
in F 2 ~. Consequently, it is advantageous to rewrite the exponent d as a signed 
T-adic NAF 

n+1 

d = ^ CiT* mod (r" — 1) , 

i=0 

with Ci G {—1,0, 1} and • Cj+i = 0. This representation is based on the fact 
that Z[r] is an euclidian ring. An algorithm for computing the T-adic NAF is 
given in [19]. This encoding yields the following scalar multiplication algorithm: 

Algorithm 1 : Addition-substraction method with r-adic NAF 

Input:? 

Output:Q 
Q ^ [e„+i]P 
for t <— n to 0 do 
Q ^ tQ 

if Cj = 1 then Q ^ Q + P 
if Ci = — 1 then Q ^ Q — P 
return Q 

The algorithm requires approximately n/3 point additions instead of n dou- 
bles and n/3 additions for the general case [19] . If we neglect the cost of squarings, 
this is four times faster. 

As in the general case, it is possible to reduce the number of point additions 
by precomputing and storing some “small” T-adic multiples of P. [19] describes 
an algorithm which requires the storage of 

2“ - (-1)“ 

C{u)) = points , 

o 

where w is a trade-off parameter. Precomputation requires C(w) — 1 elliptic 
additions, and the scalar multiplication itself requires approximately 

71 

elliptic additions , 

U! + 1 

which gives a total workload of 

2“^ n 

~ — -I elliptic additions . 

3 CO L 

For example, for the 163-bit curve i?i(F2i63) and w = 4, a scalar multiplication 
can be performed in approximately 35 additions, instead of 52 without precom- 
putation. 

When P is known in advance, as is the case for protocols such as Elliptic 
Curve Difhe-Hellman or ECDSA, it is possible to precompute and store the 
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“small” r-adic multiples of P once for all. The real time computation that re- 
mains is the scalar multiplication itself, which requires around n/(w -I- 1) opera- 
tions when C{w) points are stored. For example, for the 163-bit curve ifi(F2i63), 
a scalar multiplication can be performed with w = 7 in about 19 additions if 43 
points are stored. 

In the next section we describe an algorithm for producing random pairs 
{k, [k]P) which requires even fewer additions for approximately the same num- 
ber of points stored in memory. This algorithm appears to be well-suited for 
constrained environments such as smart-cards. 



4 Fast Generation of (k, [fc]P) 

4.1 A Simple Generator 

Many EC cryptographic protocols such as Elliptic Curve Difhe-Hellman for key 
exchange [13] and ECDSA for signature [13] require to produce pairs (fc, [k]P) 
consisting of a random integer k in the interval [0, g — 1] and the point [k]P, 
where g is a large prime divisor of the order of the curve, and P is a fixed point 
of order q. 

For ECDSA this is the initial step of signature generation. The x coordinate 
of [k]P is then converted into an integer c modulo q and the signature of m is 
(c, s) where s = {Pl{m) + d - c)/k mod q and d is the private key associated to 
the public key Q = d.P. 

[1] describes a simple method for generating random pairs of the form (x, g^). 
This method can be easily adapted to the elliptic curve setting for computing 
pairs (fc, [k]P), where P is a point of order q. 

Preprocessing: 

Generate t integers ki,. . . ,kt € 

Compute Pj = kj.P for each j and store the kj’s and the Pj’s in a table. 

Pair generation: 

Randomly generate S C [l,t] such that [S'] = k. 

Let k = mod q. 

Let Q = return (fc, Q). 



The algorithm requires k—1 elliptic curve additions. Of course, the generated 
k is not uniformly distributed and the parameters have to be chosen with great 
care so that the distribution of the generated k is close to the uniform random 
distribution. 



4.2 The New Generator 

We consider the generator of figure 1 which produces random pairs of the form 
(fc, [k]P) on a Koblitz curve defined over F 2 «. 
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Preprocessing: 

Generate t integers ki, . . . , kt & Zq. 
Compute Pj = kj.P for each j 
Store the kj’s and the Pj’s in a table. 

Pair generation: 

Generate k random values Sj = ±1 
Generate k random integers G [0,n — 1]. 
Generate k random indices G 

K 

Let k = Si • mod q. 

i=l 

Let Q = ^ Si ■ P' ■ Pr^ ■ 

i—1 

Return (k,Q). 



Fig. 1. Generation of (fe, [k]P) pairs on Koblitz curves 



The difference with the previous generator is the use of the Frobenius map 
T, which increases the entropy of the generated k. The new generator requires 
K — 1 elliptic curve additions and t points stored in memory. In the next section 
we describe an efficient implementation of the new generator. 

4.3 Implementing the Generator 

The new generator uses the Frobenius map t extensively, as on average k • nj2 
applications of r are performed for each generated pair, which represents k • n 
squarings. 

Squaring comes essentially for free when F 2 »> is represented in terms of a 
normal basis: a basis over F 2 »i of the form 

{ 0 , 0 ^ 02 ^,..., 02 "-^} . 

Namely, in this representation, squaring a field element is accomplished by a 
one-bit cyclic rotation of the bitstring representing the element. 

Elliptic curve additions will be performed using a polynomial basis represen- 
tation of the elements, for which efficient algorithms for field multiplication and 
inversion are available. A polynomial basis is a basis over F 2 « of the form 

{l,x,x‘^,. . . . 

The points Pj are stored using a normal basis representation. When a new pair 
is generated, the point r®* • Pr^ is computed by successive rotations of the coor- 
dinates of Pn ■ Then r®* • P^ is converted into a polynomial basis representation 
and it is added to the accumulator Q. To convert from normal to polynomial 
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Preprocessing: 

Generate t random integers ki, ... ,kt G 

Compute Pj = kj.P for each j 

Store the kj’s and the Pj’s in normal basis. 

Pair generation: 

Generate k random integers Cj G [0, n — 1] 
Sort the e*: ei > 62 > . • . > 

Set e^+i ^ 0 

Set Q ^ O and fc ^ 0. 

For i ^ 1 to K do: 

Generate a random integer r £ [l,t] 
Generate a random s <— ±1 
Compute i? <— s • t®* • P^ in normal basis. 
Convert R into polynomial basis. 

Compute Q ^ Q + R 
Compute k ^ -y-ei-ci+i . Z[r]. 

Convert k into an integer. 

Return (k,Q). 



Fig. 2. Algorithm for implementing the generator of (fc, [k]P) pairs for Koblitz curves 



basis, we simply store the change-of-base matrix. The conversion’s time is then 
approximately equivalent to one field multiplication, and this method requires 
to store O(n^) bits. 

Before a new pair (fc, [k]P) is computed, the integers e/s are sorted: Ci > 
62 > • ■ • > e„, so that k can be rewritten as 

k = T®-' (s«, • kr^ + (s„_i • +•■■)) • 

The integer k is computed in the ring Z[r] a,s k = k' + k" ■ t where k' , k" G Z. 
The element k G Z[t] is finally converted into an integer by replacing r by an 
integer T in Zg solution of the equation 

T^ + 2 = (-l)i-“T modg, 

so that for any point Q, we have t{Q) = \T]Q. 

The implementation of the generator is summarized in figure 2. 



5 Lattice Reduction Attacks and Hidden Subsets 

When the generator is used in ECDSA, each signature (c, s) of a message m 
yields a linear equation 



k • s = H{m) + d • c mod q , 
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where d is the unknown secret key and fc is a sum of the hidden terms ±t* • fcj. 

The generator of [1] described in section 4.1 for which fc is a sum of the 
hidden terms ki has been attacked by Nguyen and Stern [11]. However, the attack 
requires the number of hidden ki to be small (around 45 for a 160-bit integer k). 
The security of the generator relies on the difficulty of the hidden-subset sum 
problem studied in [11]: given a positive integer M and bi, . . . ,bm G Zm, find 
«i, . . . , On G Zm such that each bi is some subset sum of «i, . . . , modulo M. 

For the new generator, if one simply considers all t* • ki to be hidden, this 
yields a large number of hidden terms (n • t, where n is the field size and t the 
number of stored points) which can not be handled by [11]. We did not find any 
way of adapting [11] to our new generator. 



6 Security Proof for Signature Schemes 
Using the New Generator 



Since the generated integers k are not to be uniformly distributed, the security 
might be considerably weakened when the generator is used in conjunction with 
a signature scheme, a key-exchange scheme or an encryption scheme. In this 
section, we provide a security proof in the case of probabilistic signature schemes. 

In the following, we relate the security of a signature scheme using a truly 
random generator with the security of the same signature scheme using our 
generator. Resistance against adaptive chosen message attacks is considered. 
This question has initially been raised by [12], and we improve the result of [12, 
p. 9]. 

Let 5 be a probabilistic signature scheme. Denote by TZ the set of ran- 
dom elements used to generate the signature. In our case of interest, TZ will 
be {0, . . . , g — 1}. Let G be a random variable on TZ. Define Sg as the signature 
scheme identical to S, except that its generation algorithm uses G as random 
source instead of a truly random number generator. 

The following theorem shows that if a signature scheme using a truly random 
number generator is secure, the corresponding signature scheme using G will be 
secure if the distribution of G is sufficiently close to the uniform distribution. 
The proof is given in appendix. 

If is a random variable on a set 17, we denote by 62 {X) the statistical 

/ , , 2 \ 1/2 



distance defined by S 2 {X) 




CJ G 



Pr{X = w) 




. In the same way. 



we define := 



Pt{X = oj) 



\o\ ■ 



Theorem 1. Let Aq be an adaptive chosen message attack against the signature 
scheme Sq, during which at most m signature queries are performed. Let A be the 
corresponding attack on the signature scheme S. The probabilities of existential 
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forgery satisfy 



I Pr {A succeeds) — Pr {Aq succeeds) \ < 



(l+|7^|52(G)2)W2_l 

(l+|7^|52(G)2)l/2-l 



\/\R\Vr{A succeeds) 52{G) . 



Note that asymptotically for \R\52{G)'^ <C 1, the bound of theorem 1 yields 
the inequality 

I Pr {A succeeds) — Pr {Aq succeeds) | < m\/\R\ Pr {A succeeds) (52(G) , (4) 



which has to be compared to the inequality of [12], 

I Pr {A succeeds) — Pr {Aq succeeds) | < m5i{G) . 



In the following, we consider our generator of pairs (k, [k]P) of section 4, 
which we denote by fc, and compute its statistical distance 52{k) to the uniform 
distribution. Using the previous theorem with G = k and TZ = {0, . . . ,q — 1}, 
this will provide a security proof for a signature scheme using our generator. 

The following theorem is a direct application of a result exposed in [12]. It 
gives a bound on the expectation of (52(fc)^, this expectation being considered on 
a uniform choice of ki, ... ,kt- 

Theorem 2. If the ki are independent random variables uniformly distributed 
in {0, . . . , <7 — 1}, then the average of 62{k)^ over the choice of ki, ... ,kt satisfies 



EMk)^] < 



1 

(2^)'=C) ■ 



In order to use this inequality, we have to link S2{k) to E[62{k)‘^]] a simple 
application of Markov’s inequality yields: 

Theorem 3. Let e > 0. With probability at least 1 — e (this probability being 
related to a uniform choice of ki, . . . ,kt), we have 



52{k) < 



E[52{k)-^ 



Theorem 1 shows that the parameter which measures the security of the 
signature scheme using our generator is y^|7^|i52(G) = y/q ■ S2{k). In table 1 we 
summarize several values of the bound on ^y([Gjf](§2(ky^ of theorem 2, which 
using theorem 3 provides an upper bound for y^q ■ 52{k). We stress that the 
number k of points to be stored has to be corrected by the amount of data that 
are required to convert from normal to polynomial basis. Roughly, one must add 
to K the equivalent amount of n/2 points of the curve, to obtain the total amount 
of storage needed. 

For example, consider the ECDSA signature scheme using our generator with 
a field size n = 163, k — 1 = 15 point additions and t = 100 precomputed points. 
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Table 1. logj \/q ■ E[52{k)^] for various values of k and t for n = 163 



k/I 


25 


50 


100 


150 


200 


10 


31 


26 


21 


18 


16 


14 


15 


7 


0 


-4 


-6 


16 


6 


-2 


-10 


-15 


-18 


18 


-1 


-10 


-19 


-25 


-28 


20 


-9 


-19 


-29 


-35 


-39 


25 


-27 


-40 


-53 


-60 


-65 



Assume that up to m = 2^® messages can be signed by the signer. Using table 1, 
we have ^/q ■ E[S 2 {k)‘^] ~ Using the inequality of theorem 3, we know that, 
except with probability 2“^°, we have y/qS 2 {k) < 2“^°/2“® = 2“®. Assume that 
for a given time bound, the probability of any attack A breaking the ECDSA 
signature scheme with a truly random generator after m = 2^^ signature queries, 
is smaller than 2“®° for n = 163. Then the probability of breaking the ECDSA 
signature scheme with our generator in the same time bound is smaller than 

Pr {Ag succeeds) < 2^^ ■ ■ 2~^ = 2~^^ . 

This shows that the ECDSA signature scheme remains secure against chosen 
message attacks when using our generator for this set of parameters. 

7 Parameters and Performances 

We propose two sets of parameters for the field size n = 163. The first one is 
K = 16 and t = 100 (which corresponds to 15 additions of points), the second 
is K = 11 and t = 50 (which corresponds to 10 additions). The first set of 
parameters provides a provable security level according to the previous section, 
whereas the second set of parameters lies in a grey area where the existing attacks 
by lattice reduction do not apply, but security is not proven. 

Recall that the scalar multiplication algorithm described in section 3.2 re- 
quires 19 elliptic curve additions with 43 points stored. Thus, the two proposed 
parameter sets induce a 21% and a 47% speed-up factor, respectively^. 

8 Conclnsion 

We have introduced a new generator of pairs (fc, [k]P) for anomalous binary 
curves. This pairs generator can be used for key exchange (ECDH), signature 
(ECDSA) and encryption (El-Gamal schemes). We have shown that for an appro- 
priate choice of parameters, a probabilistic signature scheme using our generator 

^ If we neglect the cost of squaring the Uj’s, converting from normal to polynomial 
basis and computing k. 
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remains secure against chosen message attacks. This result can be extended to 
key exchange schemes and encryption schemes. 

We have provided a first set of parameters which provides a speed-up factor 
of 21% over existing techniques, with a proven security level. The second set of 
parameters provides a speed-up factor of 47%, but no security proof is available. 
However, since security is proven for slightly larger parameters, this provides a 
convincing argument to show that the generator has a sound design and should 
be secure even for smaller parameters. 
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A Proof of Theorem 1 

Theorem 3. Let S be a probabilistic signature scheme. Let TZ be the set from 
which the signature generation algorithm chooses a random element when gener- 
ating a signature. Let G be a random variable in TZ, and Sq the scheme derived 
from S which uses G as random source instead of a random oracle for the sig- 
nature generation. Let Aq be an adaptative attack with m chosen messages on 
Sg- If a is the corresponding attack on S, then the probabilities of existential 
forgery satisfy 



I Pr {A succeeds) — Pr {Aq succeeds) \ < 

Proof. An adaptative attack with m chosen messages makes m queries to a 
signature oracle. At each call, this oracle picks a random r in TZ, and uses this 
r to produce a signature. If the signature scheme is S , r is chosen uniformly in 
TZ, and is thus equal to the value of a random variable U uniformly distributed 
in TZ. If the signature scheme is Sg, r is the value of the random variable G. 
Consequently, an attack with m chosen messages depends on a random variable 
defined over the probability space TZ^. This variable is either U = {U\,. . . , Um) 
in the case of an attack against S, or G = {Gi, . . . ,Gm) in the case of an 
attack against Sg, where the Ui are pairwise independent and follow the same 
distribution as U, and the Gi are pairwise independent and follow the same 
distribution as G. 
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The following proof is a refinement of the result that can be found in [12] 
concerning accelerated signatures schemes. First note that as A and Aq are 
the same attacks (that is, are the same Turing machines making calls to the 
same signature oracle except that they use different random sources), for all 

Pr(yl succeeds I = r) = Y’v{Ac succeeds |G = r) . 

Thus, using Bayes formula, we get 

I Pr {A succeeds) — Pr {Aq succeeds) | < 

^ (5) 

|Pr(G = r) — Pr(Ll = r)| Pr {A succeedsjLl = r) . 

Using the triangular inequality, the independence of the Ui and of the Gi, and 
the equidistribution property, we get also that 

|Pr(J7 = r) -Pr(G = r)| < 



Pr(G = ri) |Pr(C/ = rfc) - Pr(G = rfe)| Pr([7 = r*) , 



k—1 \l<i<k 



< m>i>k 



with the convention that the product of zero terms is equal to 1. 

Consequently, if we denote, for fc = 1, . . . , m, by afe(r) the quantity 



( 6 ) 



Pr(G = ri)j I Pr([7 = rfc) - Pr(G = rfc)| I Pr([/ = r*) | , 

equation (5) can be rewritten as 

I Pr {A succeeds) — Pr {Aq succeeds) | < 



m (7) 

EE ak{r)Pr{A succeedsjG = r) . 

fc — 1 

Using Cauchy’s inequality, 

afe(r) Pr(yl succeeds |G = r) = 



< 

rfZTlm 




Yj I^I ™Pr(^ succeeds I G 



rf 



1/2 
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And as Pr(A succeeds|Ll = r) < 1, 

|7^|~™Pr(A succeeds] J7 = r)^ < 

T-eK"* 

(9) 

Pr(A succeeds] [/ = r) = Pr {A succeeds) , 

because U is uniformly distributed over TZ. Returning to the definition of afe(r), 
and using once again the uniformity of U, one sees that 



]7^]'"afe(r)2 < ]7^] I Y[ \U\Pr{G = nf \Pr{U = r^) - Pr{G = . 



Now, one needs to note that 



^ ]7^]Pr(G = ri)2 = 



^ ]7^] (^Pr(G = n) - - 1/I^P + (2/1^1) Pr(G = r,) j = 



]7^]^2(G)2 + l . 

Thus, the inequality (10) becomes, 

E \ T ^ rMr ?< 

\ n \ (1 + ]7^]^2(G)2)"-' ^ ] Pr(G = Tk ) - Pr(G = rfc)]^ = 

rk&TZ 

]7^](l + ]7^]^2(G)2)"-' 52(G)" . 

Returning to inequality (7), and using (8) and (9), we finally get: 



] Pr {A succeeds) — Pr {Ac succeeds) ] < 

™ 1/2 

(l^l (1 + ]7^]52(G)")''■^ <^ 2 (G)") (Pr (A succeeds))’ 



(]7^] Pr {A succeeds)) 



1/2 (1 + ] 7 ^] 52 ( G )")™/" - 1 
(l+]7^]52(G)2)l/2-l' 



□ 
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Abstract. This paper compares different approaches for computing 
power products r[i<i<fc commutative groups. We look at the con- 

ventional simultaneous exponentiation approach and present an alterna- 
tive strategy, interleaving exponentiation. Our comparison shows that 
in general groups, sometimes the conventional method and sometimes 
interleaving exponentiation is more efficient. In groups where inverting 
elements is easy (e.g. elliptic curves), interleaving exponentiation with 
signed exponent recoding usually wins over the conventional method. 



1 Introduction 

A common task in implementations of many public-key cryptosystems is multi- 
exponentiation in some commutative group G, i.e. evaluating a product 

n 

l<i<k 

where A: > 2 is a small integer, each is an element of G, and each Cj is an 
integer (typically a few hundred up to a few thousand bits long) . We require that 
the Ci be non-negative (otherwise, invert gi). Example groups include {hlnL)* 
for some integer n, e.g. for verification of ElGamal [11] or DSA [17] signatures; 
groups of rational points on elliptic curves over finite fields, e.g. for verification of 
ECDSA [1] signatures; and class groups of imaginary-quadratic orders, e.g. for 
verification of RDSA [2] [7] signatures. We have k = 2 for DSA and ECDSA 
verification and fc = 3 for ElGamal and RDSA verification. Larger values of k 
appear in protocols of Brands [4] . In the present paper, we allow fc = 1 as well for 
algorithms; efficiency considerations may ignore this case. It is well known that 
in general it is unnecessarily inefficient to compute the powers g®* separately 
and then multiply them. Instead, specific algorithms for multi-exponentiation 
are usually applied. 

We assume that the consist of independent random bits up to a respective 
maximum bit-length 6^; i.e., is a uniformly distributed random integer in the 
interval [0, 2^* — 1]. (In practice the actual distribution may differ, but for typical 
cases this simplified assumption is reasonably close.) In this setting, we consider 
general algorithms for arbitrary exponents; we do not examine algorithms based 
on tailor-made addition chains in Z* for given ei, . . ., Cfc (cf. [3]). (Note that even 
if an exponent is fixed in a cryptographic protocol, it is sometimes desirable to 
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perform computations using varying exponents in order to thwart side-channel 
attacks that try to use timings [12] or power consumption measurements [13] 
or other extra data to gain knowledge on secret exponents. To avoid constant 
exponents, g® can be rewritten as and J][ can be rewritten as 

{Y{9^rY{9r" for arbitrary integers n.) 

Like window-based algorithms for single exponentiations, the algorithms that 
we analyse work in two stages: First, in the precomputation stage, an auxiliary 
table of group elements is computed from the elements gp, then, in the evaluation 
stage, the final result is computed using these auxiliary values. 

The usual approach for multi-exponentiation combines all input group ele- 
ments gi with each other in the precomputation stage ([11], [20], [21]); then the 
evaluation stage looks at all exponents simultaneously. In the present paper, we 
discuss an alternative approach where the gi are treated separately in the pre- 
computation stage. In this approach, the evaluation stage uses an interleaving 
of the generators and exponents for the various i rather than handling multiple 

1 simultaneously. 

We collectively refer to the multi-exponentiation methods described in [11], 
[20] and [21] as “simultaneous exponentiation”. Section 2 describes these meth- 
ods. Section 3 presents two variants of our alternative approach, which we dub 
“interleaving exponentiation”: a basic method and an alternative method that 
can be used in groups where inverting elements is easy. In section 4, we compare 
the efficiency of simultaneous exponentiation methods and interleaving exponen- 
tiation methods. Section 5 discusses variants that can be advantageous when all 
bases gi are fixed. 

In specific groups, additional useful efficiently computable endomorphisms 
are available besides squaring and possibly inversion (see e.g. [19]); this may 
lead to better multi-exponentiation algorithms for these groups. Such special 
groups are out of the scope of the present paper. 

1.1 Notation 

We write e[j] for bit j of a non-negative integer e. For negative j, we define that 
e[j] = 0. We write e[j ...j'] for the integer consisting of the concatenation of 
bits j down to / of e; e.g., if e = IOIII 2 = 23, then e[3 . . . 1] = OII 2 = 3 and 
e[l...-2] = IIOO 2 = 12. 

2 Simultaneous Exponentiation Methods 

We look at two multi-exponentiation methods using simultaneous exponentiation 
(as opposed to interleaving exponentiation, which is introduced in section 3): 
Straus’s 2’"-ary method (section 2.1) and the sliding window method of Yen, 
Laih, and Lenstra (section 2.2). (The method known as “Shamir’s trick” appears 
as a special case of both of these.) 

As noted in the introduction, all algorithms that we consider are related and 
work in two stages: First, the precomputation stage prepares an auxiliary table 
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of group elements; then, the evaluation stage computes the final result using this 
table. For comparing different methods, we examine the two stages separately. 

When examining simultaneous exponentiation algorithms, we assume that hi 
is the same for all i. Let at least one be non-zero, and let b be the bit-length 
of the longest of the e^. Parameter w is always a positive integer, the “window 
size”; larger window sizes make the precomputation stage less efficient, but speed 
up the evaluation stage. It is not possible to give a general rule for selecting an 
optimal w (cf. section 4). 

Relevant features of the precomputation stage are the number of group op- 
erations required for computing the auxiliary table, and the number of table 
entries. For group operations, we differentiate between squarings and general 
multiplications, since the former often can be computed more efficiently. The 
precomputed tables will always contain the values g\, . . ., gk, all of which are 
trivially available and hence can be neglected. It will be visible that computing 
each additional table entry requires one multiplication or, for some of the table 
entries in the simultaneous 2“-ary method, one squaring. In addition to this, k 
squarings are needed by the simultaneous sliding window method if w > 1. 

The evaluation stage requires both squarings and multiplications. For each 
multi-exponentiation method, we look at the number of squarings and the ex- 
pected number of general multiplications for given k, b, and w. w is assumed to 
be small in comparison to b (otherwise the precomputation stage would become 
unreasonably expensive) . 

It should be noted that a slight optimisation for the precomputation stage is 
possible in all methods by first looking which table entries are actually needed 
(either during the evaluation stage, or because other precomputed table entries 
that are needed in the evaluation stage depend on them) and limiting precom- 
putation to these. As this optimisation will usually only have a small effect in 
practice, we neglect it in our comparisons. 

For the number of squarings in the evaluation stage, we assume that the 
following optimisation is used: As initially variable A is 1 g (the neutral element 
of G) in all algorithms, squarings can easily be avoided until a different value 
has been assigned to A. 

Formulas for the expected number of multiplications during the evaluation 
stage given in the following are actually asymptotics for large b/w rather than 
precise values (we do not take into account the special probability distributions 
encountered at both ends of the exponents). As in practice w will be much 
smaller than b, the error is negligible for our purposes. 

Just as squarings can be eliminated in the evaluation stage while A is 1 g, the 
first multiplication of A by a table entry can be replaced by an assignment. This 
minor optimisation is not used in our figures below; note that it applies similarly 
to all algorithms discussed in this paper (and does not affect asymptotics), so 
comparisons between different methods remain just as valid. 
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2.1 Simultaneous 2’"-Ary Method 

The simultaneous 2™-ary exponentiation method [20] (see also [15]) looks at w 
bits of each of the exponents for each evaluation stage group multiplication, 
i.e. kw bits in total. The special case where w = 1 is also known as “Shamir’s 
trick” since it was described in [11] with a reference to Shamir. 

Precomputation Stage. Precompute for all non-zero fc-tuples 

(ifi,...,iffe)G{0,...,2--lf. 

Number of non-trivial table entries: 2^“ — 1 — fc. Of these, 2^^“’ — 1 can be 

computed by squaring other table entries (all the Ei are even). The remaining 
2 kw _ 2 k{w-i) _ ^ r 0 quire one general multiplication each. 

No additional squarings are required. 



Evaluation Stage. 



\c 

for j = [(6 — l)/w\w down to 0 step w do 
for n = 1 to ru do 

if (ei[j + ic- l...j],...,efc[j-kw- l...j]) yf (0,...,0) then 

A^ A Y\- {multiply A by table entry} 

return A 



Number of squarings: 



b- 






Expected number of multiplications: b ■ 




2.2 Simultaneous Sliding Window Method 

The simultaneous sliding window exponentiation method of Yen, Laih, and 
A. Lenstra [21] is an improvement of the 2“'-ary method described in section 2.1. 
Due to the use of a sliding window, table entries are required only for those tu- 
ples (El, . . . , Ek) where at least one of the Ei is odd. (Note that while values gf 
no longer appear in the precomputed table, the precomputation stage now needs 
them as intermediate values unless ru = 1.) Also the expected number of multi- 
plications required in the evaluation stage is reduced. Like the 2*"-ary method, 
this method looks at w bits of each of the exponents for each evaluation stage 
group multiplication (kw bits in total). For ic = 1, this again is “Shamir’s trick”. 
For fc = 1, this is the usual sliding window method for a single exponentiation 
(see e.g. [15]). 

Precomputation Stage. Precompute f®'' fc-tuples (Ei, . . .,E^) 

G {0, . . ., 2™ — 1}^ where at least one of the Ei is odd. 

Number of non-trivial table entries (multiplications): 2*™ — — k. 

Number of squarings: fc if ic > 1; none otherwise. 
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Evaluation Stage. 

1g 

j ^b-1 

while j > 0 do 

if Vi G {1, . . fc}: ei[j] = 0 then 
j ^ j -1 

else 

jnew ^ max(j - w, -1) 

J ^ jnew “t” 1 

while Vi G {1, . . k}: ei[J\ = 0 do 
J-hl 

{now j > J > jnew} 
for i = 1 to fc do 
Et ^ 6i[j ...J] 

while J > J do 

A ^ A^; j ^ j - 1 

A ^ A • Hi gf' (multiply A by table entry} 
while j > jnew do 

A^ A^; j ^ j -1 

return A 

Number of squarings: 6 — w up to 6 — 1. 

Expected number of multiplications: b ■ 1 — = b ■ — . 

w + E„>i^ 

3 Interleaving Exponentiation Methods 

Here, we look at two interleaving exponentiation algorithms: A basic algorithm 
suitable for arbitrary groups (section 3.1) and a special variant using signed 
exponent recoding that can be applied if inverting elements is easy (section 3.2). 

The comments in the introduction to section 2 apply similarly, with the 
exception that we no longer assume all the bi to be identical. Instead of a single 
window size w, in this section we have k possibly different window sizes Wi 
( 1 < t < fc) used for the respective parts of the multi-exponentiation; each Wi is 
a small positive integer. Again we assume that initial squarings are eliminated 
while A is Iq- 

Note that for the algorithms described in this section, the precomputed table 
has disjoint parts for different bases gi. If multiple multi-exponentiations have 
to be performed and some of the bases gi appear again, then the corresponding 
parts of earlier precomputed tables can be reused. 

3.1 Basic Interleaving Exponentiation Method 

The basic interleaving exponentiation method is a generalization of the sliding 
window method for a single exponentiation (see e.g. [15]), to which it corresponds 
in case k = 1. 
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Precomputation Stage. For i = 1, . . ,,k, precompute gf for all odd E such 
that 1 < F; < 2’"* - 1. 

Number of non-trivial table entries (multiplications): ( ~ k- 
Number of squarings: ff{i G {1, . . F} | rci > l}. 

Evaluation Stage. 

1g 

for i = 1 to fc do 

window -handle i ^ nil 
for 7 = 5—1 down to 0 do 

for i = 1 to /c do 

if window -handle i = nil and ei[j] = 1 then 
J ^ j — Wi + I 
while ei\J] = 0 do 
J ^ J+ 1 

{now j > J > j — Wi and J > 0} 
window -handle i ^ J 
F/j e^[j . . . F] 
if window -handle i = j then 

A ^ A - gf' (multiply A by table entry} 
window -handle i ^ nil 
return A 



Number of squarings: b — max^ Wi up to 5 — 1. 
Expected number of multiplications: 



l<i<k 



1 



+ E 



J_ 
n>l 2^ 



E 

l<i<k 



1 

Wi + r 



3.2 wNAF-Based Interleaving Exponentiation Method 

In some groups, elements can be inverted very efficiently so that division is not 
significantly more expensive than multiplication. (Inversion is cheap in case of 
elliptic curves or class groups of imaginary quadratic number fields, but not in 
(Z/nZ)*.) This can be exploited for making exponentiation algorithms more effi- 
cient by recoding the exponents into a signed representation. We use a technique 
introduced for single exponentiations independently in [18] and in [16] and apply 
it to the task of multi-exponentiation. 

Given an exponent Cj and a window size Wi, we need a width-{wi + 1) non- 
adjacent form {width-{wi + l) NAF or wNAF) of Cj, which is an array Ni[bi ], . . ., 
Ni [0] of integers such that 

— each Ni[j] is either 0 or odd with an absolute value less than 

~ = Eo<i<hi ^i[j] ' j 

— at most one of any Wi + 1 consecutive components of the array is non-zero. 
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A width- (wj -I- 1) NAFs always exists and is uniquely determined; it can be 
computed by the following algorithm [19]: 

C ^ Ci 

J^O 

while c > 0 do 
if c[0] = 1 then 
u <— c[wi ... 0 ] 
if u[wi] = 1 then 

c ^ c — u 

else 

M ^ 0 

^u; j ^ j + l 

c ^ c/2 

while j < bi do 

^ 0; j ^ j + 1 
return Ni[bi\,. . .,iVj[0] 



The maximum possible index for a non-zero component of the wNAF of a 
B-bit integer is B; i.e., the length of the wNAF without leading zeros may exceed 
the length of the binary expansion by one. The average density (proportion of 
non-zero components) in width-(wi -I- 1) NAFs is l/{wi -|- 2) for i? — > oo [19]. 



Precomputation Stage. For i = 1, . . ,,k, precompute gf for all odd E such 
that 1 < A < 2™* — 1. (As inversion in G is assumed to be easy, this makes g~^ 
available as well.) 

Number of non-trivial table entries (multiplications): ( ~ k- 
Number of squarings: ff{i G {1, . . ., k} \ Wi > l} . 



Evaluation Stage. 

A ^ 1(5 

for t = 1 to fc do 

A^,[6],...,iV,[6, + l] ^0,...,0 
Ni[bi], . . A’i[0] ^ width-(ruj -|- 1) NAF of Cj 
for j = b down to 0 do 

for i = 1 to /c do 
if Ni[j] yf 0 then 

A ^ A ■ {multiply A by [inverse of] table entry} 

return A 

Number of squarings: b — max^ Wi up to b. 

Expected number of multiplications: ' w ^2 ’ 
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We can compare wNAFs with the sliding window technique of the basic inter- 
leaving exponentiation algorithm. Windows can be represented by components 
of an array as in the wNAF approach: In the algorithm description of section 3.1, 
Ei provides component values; array indexes are given by window .handle i. With 
the array filled in accordingly, we can use the same evaluation stage algorithm 
as in the wNAF-based method. The average density is l/{wi -1-1) (each window 
covers Wi bits, and the number of additional zero bits between neighbouring win- 
dows is 1 on average). With wNAFs, the average density goes down to l/{wi + 2) 
for exactly the same precomputation. Thus using wNAFs effectively increases 
the window size by one. 

4 Comparison of Simultaneous 

and Interleaving Exponentiation Methods 

There is no general rule for selecting window sizes for the multi-exponentiation 
algorithms that we have looked at. Various factors have to be considered: First 
of all, absolute memory constraints can impose limits on possible window sizes. 
Second, even if a particular window size appears to minimise the total amount 
of computation for a multi-exponentiation, sometimes slightly smaller windows 
may improve the actual performance; this is because larger window sizes mean 
larger precomputed tables, i.e. possibly additional memory allocation overhead 
and less effective memory caching. Last but not least, implementations can use 
different representations for group elements during different stages of the multi- 
exponentiation: For instance, extra effort may be spent during the precompu- 
tation stage in order to obtain representations of precomputed elements that 
speed up multiplication with them in the evaluation stage (e.g. affine rather 
than projective representations of points on elliptic curves [8]). 

These effects, however, do not mean that we cannot compare algorithms 
without looking at concrete cases: We can compare different aspects separately 
(table size, precomputation stage efficiency, evaluation stage efficiency) and look 
if an algorithm wins on all counts. 

For the following comparisons, we assume that all maximum exponent lengths 
hi are the same (an assumption that we made in section 2 on simultaneous 
exponentiation methods, but not in section 3 on interleaving exponentiation 
methods). As before, let b be the length of the largest of the exponents Ci. 

In section 4.1, we compare the simultaneous 2™-ary method with the basic 
interleaving method and show that the latter is usually more efficient for A: = 2 if 
squarings are about as costly as multiplications. In section 4.2, we compare the 
simultaneous sliding window method with the wNAF-based interleaving method 
and show that the latter is more efficient for k = 2 and k = 5, assuming that 
computing and storing the wNAFs is not too costly. Section 4.3 briefly discusses 
the alternative multi-exponentiation method from [10] and shows that is obviated 
by our interleaving exponentiation methods. Finally, in section 4.4, we look at 
some concrete figures for the number of multiplications required by different 
methods for example values of k and b. 
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4.1 Comparison between the Simultaneous 2'^-Ary Method 
and the Basic Interleaving Method 

While the simultaneous sliding window method is more efficient than the simul- 
taneous 2™-ary method, this section focuses on the latter. The reasons is that the 
2™-ary method is often used in practice (e.g. [6]), possibly because it is perceived 
to be simpler to implement. The basic interleaving exponentiation method is not 
too complicated (in particular, indexes into the precomputed table are easy han- 
dle), and as we will see, it is often more efficient than the simultaneous 2™-ary 
exponentiation method. So when the intention is to avoid the simultaneous slid- 
ing window method, the basic interleaving method appears preferable for many 
applications. 

Assume that, given k and b, a certain w turns out to provide optimal efficiency 
for the simultaneous 2’"-ary exponentiation method (section 2.1) when performed 
in a specific environment. Then the precomputation table requires 2^™ — 1 — k 
non-trivial entries, — 1 of which can be computed with one squaring each 

(while each of the remaining entries requires one general multiplication). 

For the basic interleaving exponentiation method (section 3.1), we can use 
uniform window sizes Wi = . . . = Wk = kw. Then the precomputation table has 
non-trivial entries, each of which requires one general multiplication; 
also k additional squarings are needed (unless k = w = 1). 

Thus in case k = 2, the table grows from 2^™ — 3 to 2^“ — 2 non-trivial entries, 
and instead of 2^“ — 3 group operations of which — 1 are squarings, 

we need 2^*" group operations of which only 2 are squarings. If squarings are 
about as expensive as general multiplications, then for fc = 2 the overall cost of 
precomputation is comparable for these two multi-exponentiation methods. 

The number of squarings in the evaluation stage is always nearly b for both 
methods. The expected number of general multiplications in the evaluation stage 
is smaller for the interleaving method (except if fc = ru = 1, in which case both 
algorithms do exactly the same): Dividing the value for the basic interleaving 
exponentiation method by the value for the simultaneous 2“-ary exponentiation 
method yields 

k w kw 2^"^ 

kw + 1 1 — 2 ^ kw + 1 2^™ — 1 ’ 

and this is less than 1 for kw > 1 (the minimum is 64/75 at kw = 4). 

Note that using w\ = . . . = Wk = kw is not necessarily an optimal choice 
of window sizes for the basic interleaving exponentiation method; using smaller 
or larger windows might lead to better performance. (Indeed, if we look just at 
the number of operations and ignore memory usage, then there is no reason why 
window sizes should depend on A:.) While the above proof only covers the case 
fc = 2, there are actually many other cases where the basic interleaving method 
is more efficient than the simultaneous 2™-ary method, even if general multipli- 
cations are much more expensive than squaring; see table 1 in section 4.4. Also 
note that the precomputation effort grows exponentially in k in simultaneous 
methods, but not in interleaving methods. 
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4.2 Comparison between the Simultaneous Sliding Window Method 
and the wNAF-Based Interleaving Method 

Similarly to section 4.1, assume that a certain w provides optimal efficiency for 
the simultaneous sliding window exponentiation method (section 2.2) for given 
k and b. In the following analysis, we require fc > 1. The precomputation table 
has 2'=“ - - k non-trivial entries, each of which requires one general 

multiplication to compute. In addition to this, k squarings are required for pre- 
computation unless w = 1. 

For the wNAF-based interleaving exponentiation method (section 3.2), we 
can use window sizes wi = . . . = Wk = kw — 1. This leads to a precomputation 
table with — k non-trivial entries, requiring one general multiplication 

each. In addition to this, we need k squarings unless kw = 2. 

The difference between the number of non-trivial tables entries (and general 
multiplications) for these two methods is 

(2'=“’ - - fc) - (fc2'=“'-2 -k)= 2'=“ (l - 2-'= - . 

This is positive for fc < 3 and negative for fc > 4. Thus, with the Wi chosen like 
this, the precomputation stage of the wNAF-based interleaving exponentiation 
method is more efficient if A: = 2 or fc = 3 (except for the case k = 3, w = 1, 
where the wNAF-based interleaving exponentiation method saves one general 
multiplication, but requires three additional squarings). 

The evaluation stage requires close to b squarings for both methods. The 
expected number of general multiplications is smaller for the wNAF-based in- 
terleaving method: b/{w + 1/k) instead of b/{w + ^s^)- 

The wNAF-based interleaving method with this choice of window sizes will 
often provide better performance than the simultaneous sliding window method 
for fc > 4 as well: If additional memory allocation is not a problem, then the 
efficiency gain of the evaluation stage usually compensates for the growth of the 
precomputed table. 

Similar to the situation in the preceding section, w\ = . . . = Wk = kw — 1 is 
not necessarily an optimal choice, and smaller or larger window sizes might be 
better (see section 4.4). 



4.3 Comparison between the Dimitrov-Jullien-Miller 

Multi- Exponentiation Method and Interleaving Exponentiation 

A multi-exponentiation method for the case k = 2 requiring two precomputed 
values (in addition to gi and 32 ) if inverting is easy, or six precomputed values 
if inversions have to be done during the precomputation stage, was described 
by Dimitrov, Jullien, and Miller in [10]. This algorithm is related to the simul- 
taneous sliding window exponentiation method of Yen, Laih, and Lenstra [21] 
(section 2.2 in the present paper), but uses signed recoding of exponents in order 
to reduce the size of the precomputed table. While the Yen-Laih-Lenstra method 
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with a window size of 1 requires an expected number of 6 • 0.75 general multipli- 
cations during the evaluation stage, the new method requires only about &• 0.534 
multiplications according to [10] (the number of squarings stays about the same). 
Yen-Laih-Lenstra with a window size of 2 needs only & • 3/7 « b ■ 0.429 multi- 
plications (table 3 of [10] erroneously assumes a value of b ■ 0.625), but has the 
disadvantage of requiring more precomputed elements, which may be a problem 
in some constrained environments. 

We do not examine the algorithm of [10] in detail; note that it is outperformed 
by the wNAF-based interleaving method of section 3.2 with wi = W 2 = 2 if in- 
version is cheap (two precomputed values, 6-0.5 multiplications) and by the basic 
interleaving method of section 3.1 with wi = W 2 = 3 otherwise (six precomputed 
values, 6 • 0.5 multiplications). 

4.4 Examples 

As noted before, endless variations are possible for defining optimisation goals. 
In this section, we ignore memory usage and squarings and the issue of differ- 
ent element representations; we make comparisons based just on the expected 
number of general multiplications required by the various methods, precompu- 
tation and evaluation stage combined. (Window sizes are chosen such that this 
cost measure is minimised.) Note that the number of squarings is approximately 
the same for the simultaneous sliding window method, the basic interleaving 
method, and the wNAF-based interleaving method: No more than k squarings 
are needed in the precomputation stage, and close to 6 squarings are needed 
in the evaluation stage. The simultaneous 2™-ary method requires — 1 

squarings for precomputation and again close to b evaluation stage squarings; so 
ignoring the cost of squaring tends to favour this method. 

Table 1 compares the number of general multiplications needed by these four 
methods for various k and 6 values. The entries for the most efficient methods in 
a particular configuration are printed in bold: For groups where inversion is easy 
so that the wNAF-based method can be used, it wins in all of these examples; for 
general groups, sometimes the simultaneous sliding window method and some- 
times the basic interleaving method requires the least number of multiplications. 
(Remember that for w = 1 there is no difference between the simultaneous 2“- 
ary method and the simultaneous sliding window method; for w > 1, the former 
is always less efficient.) 

5 Multi-exponentiation with Fixed Bases 

When many multi-exponentiations use the same bases gi, . . ., gk, it is suffi- 
cient to execute the precomputation stage just once, and we can try to make 
the evaluation stage more efficient by investing more work in precomputation. 
We cannot easily reduce the number of general multiplications in the evaluation 
stage, but we can reduce the number of squarings by using exponent splitting 
(cf. [5] and [9]) or the Lim-Lee “comb” method [14]. (Which approach is the 
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Table 1 . Expected number of general multiplications for a multi-exponentiation 
r[i<i<i; with exponents up to h bits (ci: simultaneous 2™-ary method, C 2 : simultane- 
ous sliding window method, C3: basic interleaving method, C4: wNAF-based interleaving 
method) 



k 


b = 


160 


b = 


256 


b = 


512 




1024 


b = 2048 




Cl 


44.5 


(it)=4) 


64.6 


(»=5) 


114.2 


(»=5) 


199.0 


(it— 6) 


353.3 


(»=7) 


1 


C2 


39.0 


(it)=4) 


57.7 


(»=5) 


100.3 


(»=5) 


177.3 


(it— 6) 


319.0 


(»=7) 


i 


C3 


39.0 


(w^=4) 


57.7 


(»i=5) 


100.3 


(»i=5) 


177.3 


(iuj=6) 


319.0 


(»i=7) 




C4 


33.7 


(wi=4) 


49.7 


(wi—4) 


88.1 


(»i=5) 


159.0 


(i(j^— 6) 


287.0 


(wi—fy) 




Cl 


85.0 


(iu=2) 


130.0 


(»=2) 


214.0 


(»=3) 


382.0 


(™=3) 


700.0 


(w=4) 


0 


C2 


78.6 


(w=2) 


119.7 


(»=2) 


199.6 


(»=3) 


353.2 


(™=3) 


660.4 


(»=3) 




C3 


78.0 


(wi=4) 


115.3 


(»i=5) 


200.7 


(»i=5) 


354.6 


(wi—6) 


638.0 


(»i=7) 




C4 


67.3 


(wi=4) 


99.3 


(itjj=4) 


176.3 


(»i=5) 


318.0 


(iui=6) 


574.0 


(w^—fy) 




Cl 


131.8 


(w=2) 


179.0 


(w^2) 


305.0 


(»=2) 


557.0 


(iu=2) 


1061.0 


(ui = 2) 




C2 


127.7 


(w=2) 


172.5 


(»=2) 


291.9 


(»=2) 


530.9 


(iu=2) 


1008.7 


(u> = 2) 


3 


C3 


117.0 


(wi=4) 


173.0 


(»i=5) 


301.0 


(»i=5) 


531.9 


(i(j^— 6) 


957.0 


(»i = 7) 




C4 


101.0 


(wi=4) 


149.0 


(wi=4) 


264.4 


(»i=5) 


477.0 


(iTi^e) 


861.0 


(wi—6) 




Cl 


161.0 


(iu = l) 


251.0 


(w^l) 


491.0 


(U) = l) 


746.0 


(w=2) 


1256.0 


(u>=2) 


A 


C2 


161.0 


(u; = l) 


251.0 


( 111 = 1 ) 


483.7 


(»=2) 


731.5 


(w=2) 


1227.0 


(u>=2) 


4 


C3 


156.0 


(wi=4) 


230.7 


(»i=5) 


401.3 


(»i=5) 


709.1 


(uji=6) 


1276.0 


(»i=7) 




C4 


134.7 


(wi=4) 


198.7 


(wi=4) 


352.6 


(»i=5) 


636.0 


(iui=6) 


1148.0 


(wi—6) 




Cl 


181.0 


(iu = l) 


274.0 


(u. = l) 


522.0 


(u) = l) 


1018.0 


(u; = l) 


2010.0 


(ui = l) 




C2 


181.0 


(™=1) 


274.0 


( 111 = 1 ) 


522.0 


(u) = l) 


1018.0 


(iu = l) 


1994.7 


(u>=2) 


5 


C3 


195.0 


(wi=4) 


288.3 


(»i=5) 


501.7 


(»i=5) 


886.4 


(i(j^— 6) 


1595.0 


(»i=7) 




C4 


168.3 


(wi=4) 


248.3 


(wi=4) 


440.7 


(»i=5) 


795.0 


(iTj^e) 


1435.0 


(iT^— 6) 




Cl 


214.5 


(iu = l) 


309.0 


(u. = l) 


561.0 


(u) = l) 


1065.0 


(iu = l) 


2073.0 


(u> = l) 




C2 


214.5 


(iu = l) 


309.0 


( 111 = 1 ) 


561.0 


(u) = l) 


1065.0 


(iu = l) 


2073.0 


(u> = l) 


6 


C3 


234.0 


(wi=4) 


346.0 


(»i=5) 


602.0 


(»i=5) 


1063.7 


(i(j^— 6) 


1914.0 


(»i=7) 




C4 


202.0 


(iui=4) 


298.0 


(wi=4) 


528.9 


(»i=5) 


954.0 


(iTj— 6) 


1722.0 


(wj^—6) 




Cl 


278.8 


(iu = l) 


374.0 


(u> = l) 


628.0 


(u) = l) 


1136.0 


(iu = l) 


2152.0 


(u> = l) 




C2 


278.8 


(u; = l) 


374.0 


( 111 = 1 ) 


628.0 


(u) = l) 


1136.0 


(u; = l) 


2152.0 


(to = 1) 


7 


C3 


273.0 


(wi=4) 


403.7 


(»i=5) 


702.3 


(»i=5) 


1241.0 


(i(j^— 6) 


2233.0 


(»i=7) 




C4 


235.7 


(wi=4) 


347.7 


(wi=4) 


617.0 


(»i=5) 


1113.0 


(i(jj— 6) 


2009.0 


(wi—6) 




Cl 


406.4 


(iu = l) 


502.0 


(ui = l) 


757.0 


(u) = l) 


1267.0 


(iu = l) 


2287.0 


(u> = l) 




C2 


406.4 


(iu = l) 


502.0 


(ui = l) 


757.0 


(u) = l) 


1267.0 


(u; = l) 


2287.0 


(u> = l) 


8 


C 3 


312.0 


(w^=4) 


461.3 


(»i=5) 


802.7 


(»i=5) 


1418.3 


(iTj— 6) 


2552.0 


(»i=7) 




C4 


269.3 


(wi=4) 


397.3 


(wi=4) 


705.1 


(»i=5) 


1272.0 


(i(j^— 6) 


2296.0 


(iT^— 6) 




Cl 


661.7 


(iu = l) 


757.5 


(ui = l) 


1013.0 


(u) = l) 


1524.0 


(iu = l) 


2546.0 


(u> = l) 


n 


C2 


661.7 


(iu = l) 


757.5 


(ui = l) 


1013.0 


(u) = l) 


1524.0 


(u; = l) 


2546.0 


(u> = l) 


y 


C3 


351.0 


(wi=4) 


519.0 


(»i=5) 


903.0 


(»i=5) 


1595.6 


(iUj=6) 


2871.0 


(»i=7) 




C4 


303.0 


(wi=4) 


447.0 


(it»j=4) 


793.3 


(»i=5) 


1431.0 


(uji=6) 


2583.0 


(iT^— 6) 




Cl 


1172.8 


(iu = l) 


1268.8 


(ui = l) 


1524.5 


(u) = l) 


2036.0 


(iu = l) 


3059.0 


(u> = l) 


10 


C2 


1172.8 


(u; = l) 


1268.8 


( 111 = 1 ) 


1524.5 


(u) = l) 


2036.0 


(u; = l) 


3059.0 


(u> = l) 


C3 


390.0 


(wi=4) 


576.7 


(»i=5) 


1003.3 


(»i=5) 


1772.9 


(iUj=6) 


3190.0 


(»i=7) 




C4 


336.7 


(wi=4) 


496.7 


(wi=4) 


881.4 


(»i=5) 


1590.0 


(i(j^— 6) 


2870.0 


(iT^— 6) 
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most efficient depends on details of the situation such as exponent lengths, the 
permissible size of the precomputed table, the relative cost of squarings versus 
general multiplications, and whether the wNAF-based interleaving exponentia- 
tion method is applicable.) 

Let m be an arbitrary positive integer. Assuming that fixed exponent length 
bounds bi are known, we show how to evaluate power products rii<i<fc 9p 
most m — 1 evaluation stage squarings, using a precomputed table independent 
of the specific exponents e^. 

5.1 Exponent Splitting 

Exponent splitting constructs a new power product representation by rewriting 
each factor as follows: 

... jm] 

0<j< r&i/m] 

This leads to power products consisting of X)i<i<fc factors. Any multi- 

exponentiation method can be used for evaluating these power products. 

It is evident that for the multi-exponentiation methods described in this 
paper, exponent splitting does not help if k is already large and there are many 
large exponents. (In this case, instead of using precomputation table entries for 
additional bases, window sizes should be increased; then the evaluation stage will 
require more squarings, but fewer general multiplications than with exponent 
splitting.) 

5.2 Lim-Lee Precomputation 

To apply the Lim-Lee “comb” method, for every i we choose wi such that bi < 
Wiin and precompute 

G.(^) 

for all subsets S C {0, m, 2m, . . ., {wi — l)m}. Note that then every exponent up 
to bi bits of length can be written as 

e,= ^ N,[j]-2^ 

0<j<m 



where each Ni[j] is an integer of the form with S as above. Thus we 

can use interleaving exponentiation with an evaluation stage algorithm similar 
to section 3.2, but with a reduced number of iterations. The Ni[j] values for each 
iteration need not be stored in advance, they can be extracted from the e* by 
tapping their bits in comb-shaped patterns; hence the nickname of this method. 

A refinement of this (also from [14]) is based on the observation that the 
precomputed table can be reduced in size in exchange for additional evalua- 
tion stage multiplications: Partition {0, m, 2m, . . ., (wi — l)m} into Vi subsets 
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Ti^i, . . now each of the above sets S can be written as Ui<n<i> with 

Sn = S n Ti^n, and then we have Gi{S) — rii<n<t) Gi{Sn)- Thus it suffices 
to precompute G'i(S'„) for all non-zero subsets 5'„ C Tj „ for all n; from this 
precomputed data, Gi{S) can be computed in at most Vi — 1 multiplications. 

While Lim-Lee precomputation reduces the number of squarings, the ex- 
pected number of general multiplications is larger than for the basic interleaving 
exponentiation method with a similarly sized precomputed table. (In the basic 
interleaving method, — 1 non-trivial precomputed values suffice to make 

sure that each evaluation stage multiplication covers Wi exponent bits, and we 
can skip many additional zero bits thanks to the sliding window. With Lim-Lee 
precomputation, we need at least 2^ — 2 non-trivial precomputed values to be 
able to cover W exponent bits with each evaluation stage multiplication, and 
we lose the advantage of a sliding window.) Thus if k is large, using Lim-Lee 
precomputation is a disadvantage. 

Note that it is possible to use Lim-Lee precomputation for some of the bases 
and standard precomputation (as in section 3) for others. This does not help 
for multi-exponentiation in these mixed cases, but precomputed data can then 
profitably be reused for pure Lim-Lee cases. 

Without going into details, we remark that the Lim-Lee method can be con- 
sidered an application of exponent splitting using specific multi-exponentiation 
algorithms suited for small exponents. For example, if fc = 1, the simple Lim-Lee 
method uses “Shamir’s trick”, i.e. simultaneous exponentiation with a window 
size of 1. Further algorithmic variations are possible. 

6 Conclusion 

In many cases, the basic interleaving exponentiation method compares favourably 
to the simultaneous 2™-ary method, in particular if k = 2 and squarings are 
about as costly as general multiplications. In groups where inverting elements 
is easy, the wNAF-based interleaving exponentiation method is available; its 
efficiency is superior even to the sliding window variant of simultaneous expo- 
nentiation both in the precomputation stage and the evaluation stage if fc = 2 
or k = 3, and it is usually more efficient for larger k as well. In all cases, in- 
terleaving exponentiation provides the following advantages over simultaneous 
exponentiation: 

— Improved efficiency if the bit-lengths of the exponents differ significantly. 

— More flexibility in choice of the size of the auxiliary table (and, hence, the 
time spent on precomputation), particularly if k is large. 

— Better handling of situations where one or more of the gi are fixed while 
others are variable between multiple multi-exponentiation: A corresponding 
part of the precomputation has to be done only once. (This is the case in 
DSA, ECDSA, and RDSA signature verification if multiple signatures are 
verified that are based on the same underlying parameters.) 

Thus, depending on circumstances, either the simultaneous sliding window 
method or one of the interleaving exponentiation methods may be advantageous. 
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It is easy to implement interleaving exponentiation for variable k. As the 
the special case A: = 1 of the basic and wNAF-based interleaving exponen- 
tiation methods yields the usual sliding windows exponentiation method and 
wNAF-based exponentiation method, respectively, this makes it unnecessary to 
implement these separately. 
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Abstract. In this paper we address two important topics in hyperel- 
liptic cryptography. The first is how to construct in a verifiably random 
manner hyperelliptic curves for use in cryptography in generas two and 
three. The second topic is how to perform divisor compression in the 
hyperelliptic case. Hence, in both cases we generalise concepts used in 
the more familiar elliptic curve case to the hyperelliptic context. 



1 Introduction 

Elliptic curve cryptography was co-invented in 1985 by V. Miller [13] and N. 
Koblitz [11]. Cryptography based on elliptic curves is especially attractive due 
to the supposed difficulty of the discrete logarithm problem in the group of 
rational points on an elliptic curve. In 1989 Koblitz generalised this concept 
to hyperelliptic curves [12]. In hyperelliptic cryptography the hard problem on 
which the security is based is the discrete logarithm problem in the divisor class 
group of the curve. 

Whilst elliptic curve cryptography is starting to become commercially de- 
ployed, hyperelliptic cryptography is still at the stage of academic interest. This 
is mainly due to the greater complexity of the underlying arithmetic and the 
fact that the protocols have been less standardised. One main problem in the 
hyperelliptic case, as argued in [16], is that it is currently very hard to generate 
hyperelliptic curves for use in cryptography which do not have any added extra 
structure. ^ Another problem is that the supporting algorithms which exist in 

^ There is a new general point counting algorithm by Kedlaya [10] for hyperelliptic 
curves in small odd characteristic. However, it is believed that this algorithm can be 
extended to the even characteristic case. At present the authors know of no imple- 
mentation of this algorithm and so cannot we comment on its practical efficiency. 

Just before submitting the final version of this paper to the conference proceedings, 
Pierrick Gaudry informed us that the AGM method presented at the rump session 
of EUROCRYPT 2001 can now be used to compute the group order of a Jacobian of 
a hyperelliptic curve in genus two over a field of characteristic two. Indeed the AGM 
method is practical for cryptographically sized Jacobians. Hence, the AGM method 
for genus two should therefore be preferred to ours since it allows a truly random 
curve to be used rather than one from a special family. 



S. Vaudenay and A. Youssef (Eds.): SAC 2001, LNCS 2259, pp. 181-189, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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the elliptic curve case have not been fully developed in the hyperelliptic case. In 
this paper we generalise two such techniques from the setting of elliptic curve 
cryptography to the setting of hyperelliptic curves. 

In the first we give a method to produce hyperelliptic curves in genus two 
and three which are generated in a verifiably random manner. In the second we 
give a method to perform divisor compression. 

The first contribution is needed to produce suitable curves in a trusted man- 
ner. In elliptic curve cryptography, one way to choose a curve is to generate 
curves at random until one satisfies the correct security requirements. However, 
someone else then using the system needs to trust that you did not construct a 
special curve which has some weakness that only you know about. To overcome 
this problem various standards bodies, e.g. [1], have proposed that the curve is 
generated in the following manner: 

1. Generate in any manner a 160 bit string, S. 

2. Using SHA-1 on this string generate some elliptic curve if in a known de- 
terministic manner. 

3. Compute the group order N using either the Schoof-Elkies- Atkin algorithm 
or one of the extensions to Satoh’s algorithm, see [3], [14], [15] and [18]. 

4. If the curve passes the known security checks then publish the triple 

(S,E,N), 

otherwise return to the first step. 

Under the assumption that SHA-1 is a one-way function the above method 
of curve generation prevents the choice of special elliptic curves with secret 
weaknesses. An elliptic curve chosen in the above way is said to have been 
chosen “verifiably at random” since any third party given the triple (S', E, N) 
can check very quickly that not only is the group order N correct but that the 
curve could not have been created with a known weakness since it would have 
been computationally impossible to reverse engineer the value of S which gave 
E using the above algorithm. 

We show how the above algorithm can be used to generate verifiably random 
hyperelliptic curves in characteristic two for use in cryptography. Our method 
does not produce random hyperelliptic curves taken from the totality of all hy- 
perelliptic curves but produces hyperelliptic curves which have verifiably been 
constructed in a random manner from a certain well defined subset of all hy- 
perelliptic curves. In other words it is computationally infeasible for us to have 
created a special curve with some hidden weakness. However, we stress that 
since our method produces random hyperelliptic curves from a special family it 
is possible that the curves constructed by our method have a weakness which 
we are not aware of. For further details of how special the families we construct 
actually are the reader should consult the paper [6]. 

Previous attempts at generating cryptographically strong hyperelliptic curves 
have been based on analogues from the elliptic case, namely generalisations of 
the SEA algorithm or the CM method. In [8] a first attempt at an analogue of the 
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SEA algorithm for hyperelliptic curves of genus two is reported on. The authors 
manage to compute the order of a random hyperelliptic curve of genus two of 
group order roughly 2^^®. However, this takes them many days of computing 
time. In practice one would need to repeat their method a large number of 
times before a suitable curve for use in cryptography was determined. Whilst 
the method in [8] is to be preferred over ours, it can only be used when (and if) 
the algorithms become sufficiently fast. Our method on the other hand, as we 
have already stated, is practical using today’s knowledge and technology. 

A number of authors have looked at using an analogue of the CM method to 
generate hyperelliptic curves for use in cryptography, [17], [19] and [5]. However, 
this has a number of draw backs compared to our method above. Firstly, the 
existing literature on applying the CM method to hyperelliptic curves only ap- 
plies to large odd characteristic and not characteristic two as our method does. 
Secondly, the set of curves produced by the CM method in practice, if one could 
implement it in characteristic two, would be from a far more restricted set than 
the set of curves generated by our method. 

Our second contribution is to give a method in all characteristics to perform 
divisor compression. In the elliptic curve case it is common practice to use a 
technique called point compression to reduce the sizes of the public keys being 
transported by fifty percent. This is done by noticing that an elliptic curve point 
(x, y) can be represented by x and a bit to decide which value of y to use. This is 
particularly important when deploying ECC in an environment where bandwidth 
is constrained. We will show that the elliptic curve point compression techniques 
can be naturally generalised to the hyperelliptic setting. 

The first author would like to thank J. Cannon for his support while this 
work was in preparation. 



2 Producing Hyperelliptic Curves 

Our technique of producing hyperelliptic curves verifiably at random is based 
on the method of Weil restriction of scalars as outlined in [9] . In this technique 
one takes an elliptic curve E over the field K = , where g is a power of two 

and then one constructs a hyperelliptic curve H over the subfield fc = F^. Since 
the groups E{K) and JaCfc(i7) are related by a group homomorphism one can 
easily compute, in certain cases, the group order of JaCfc(i7). 

To fix notation we are trying to generate a hyperelliptic curve over the 
field Fq, of genus g and of group order N = 2*p, where p is a prime. Before giving 
our technique for the generation of hyperelliptic curves we need to summarise 
the main security requirements for our curve. 

— p > 2^®®. This is to protect against Pohlig-Hellman, Pollard-rho and Baby- 
Step/Giant-Step attacks. 

— g < 4. This is to protect against the method of Gaudry [7]. 

— q = 2^, where r is prime. This is to protect against using Weil descent on 
JacF^(iJ). 
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— The smallest s > 1 such that = 1 (mod p) should be greater than 20g. 

This is to protect against the Tate-pairing attack [4]. 

Note, there are no other conditions which give curves with a known weakness 
and all the above conditions can be easily checked given the curve and its group 
order. 

In [9] a method is given for finding a group homomorphism from an elliptic 
curve defined over F^n to a hyperelliptic curve H defined over F^. The technique 
given is completely deterministic, although the resulting model for H is not in 
the standard form, an issue which we shall return to below. The method of [9] 
uses a set of Artin-Schreier extensions, the number of distinct extensions being 
given by an integer m, which satisfies 1 < m < n. For the exact definition of m 
see [9] , all that we shall require is that m = n and that the genus of the resulting 
hyperelliptic curve is either 2™“^ or 2™“^ — 1. In our applications we are able 
to control precisely when we obtain genus 2™~^ or genus 2™~^ — 1. 

Since we wish to produce hyperelliptic curves with Jacobians of the same 
group order as E{K) we need to choose elliptic curves so that 

n = 2™-i orn = 2’”-! - 1. 

Since one of our security requirements on g is that it should be less than four, 
these conditions are easy to satisfy. 

For cryptographic purposes it is advantageous to produce a model for the 
hyperelliptic curve of the form 

H -.Y"^ + H{X)Y = F{X) 

where degH{X) < g and degF(A) = 2g + 1. Such a model will be called 
“reduced” and we shall now describe a deterministic method to turn the hy- 
perelliptic model, produced by the method of [9], into a reduced model. This 
is important, and was not addressed in [9]. If we wish to generate hyperelliptic 
curves verifiably at random we require a deterministic mapping from the elliptic 
curve to a reduced model of a hyperelliptic curve. 

Assume that a fixed representation has been chosen for the finite fields of 
size g" and q. Using this fixed representations we can define (lexicographical) 
orders in the finite fields, hence orders on polynomials, matrices etc. Utilising 
normalisation of polynomials, polynomial division, Hermite normal forms and 
other such reduction techniques we are then able to always consider the smallest 
(or the same) object having a desired property. 

Taking the model for H produced by the method in [9] we then move the 
smallest rational point to infinity. A reduced hyperelliptic equation is then ob- 
tained by computing the minimal polynomial over the rational subfield of a 
function of smallest odd pole order at infinity and with no other poles. 

Since the algorithm, outlined above, to proceed from an elliptic curve to a 
reduced model for a hyperelliptic curve is completely deterministic, all we need 
do to produce a verifiably “random” hyperelliptic curve is to find an elliptic 
curve verifiably at “random” with the required properties. 
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2.1 Genus Two 

Take a finite field of the form K = F ^2 where g is 2 raised to a prime exponent. 
We construct, using the technique from [1] a verifiably random elliptic curve of 
the form 

Y‘^ + XY = X^ + aX^ + b 

where a,b G K, with group order equal to 2p where p is a prime number. Note 
that since p is a prime number and q is ‘large’, in the Weil descent we almost 
always obtain m = 2 and so the resulting hyperelliptic curve will have genus 
two. Then using the technique of Weil descent we can construct a hyperelliptic 
curve over the field k = which has group order divisible by p. Since the 
Weil restriction of E and JaCfc(iJ) have the same dimension, they are therefore 
isogenous. But they then have the same number of points over k and so JaCfe(il) 
will have group order exactly 2p. 

2.2 Genus Three 

For genus three we need to proceed in a slightly different way. First we choose a 
finite field of the form K = F^3 where again g is 2 raised to a prime exponent. 
Then we take an random 160-bit string and pass it through SHA-1 to obtain a 
field element v G F^s using the methods of [1] . Setting b = v + v'^ we see that 



TrK/fe(&) = 0. 



We then compute the elliptic curve 

+ XY = X^ + X"^ + b 

and its group order. This is repeated until we find a group order equal to 2p 
where p is a prime. Then using the arguments of [9] we will obtain a hyperelliptic 
curve of genus three. Although we are not choosing elliptic curves completely at 
random from all elliptic curves defined over K, we are choosing them uniformly 
at random from a subset of size g^. Just as before, we will have that JaCfe(iJ) 
has group order exactly 2p. 



Our technique for constructing hyperelliptic curves for use in cryptography is 
dominated by the time needed to apply the Schoof-Elkies- Atkin (SEA) algorithm 
or the algorithm of Satoh to a set of elliptic curves, until one with the correct 
cryptographic properties is determined. The step of transforming the elliptic 
curve into a hyperelliptic curve only takes a few seconds. Hence, to compute a 
single hyperelliptic curve of genus two with the correct cryptographic properties 
takes, for a Jacobian of size roughly 2^®*^, on the order of a couple of minutes. 
The main computational task is to repeatedly apply the SEA/Satoh algorithm 
until a suitable elliptic curve is found. Of course, exact times depend strongly 
on the details of the SEA/Satoh implementation 

Finally to end this section we give a typical example: 
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n=166 

Elliptic Curve : K is defined by + 1 = 0 

S = E4D1C989A8999ED0EF8AC7D691E5D8ADDAD481F5, 
a = 3951AD54028E7E3CF2D437A4186CCB53BF5DD39196, 
b = 140463F3747C98BAE9D9D31EAF3FCE65ADF80AEA26, 

N = 3FFFFFFFFFFFFFFFFFFFF730032E01F3184452AA1A. 

Hyperelliptic Curve : k is defined by f®® + + 1 = 0. 

H{X) = 6C935CFDD963AD086B738A^ + 103FEA81D67CBF0210A96A 
+ 47242588808C36BFBE701, 

F{X) = 660212F23F5C16AE899A9X® + 6CAEC90C545CF269FE5B1X'‘ 

+ 5A55B3786562759A427E0A® + 32C4479705A4CEBF1FEA3A^ 

+ 7F018AAEC622917758194A + 2BDCB9CD696E5142054C8. 

3 Divisor Compression 

As noted previously point compression in the elliptic curve case is an important 
tool used to save around fifty percent of the bandwidth in transferring/storing 
public keys and in Diffie-Hellman key exchange. Before describing our analogous 
method in the hyperelliptic setting we shall describe the exact data format nor- 
mally used for divisors on hyperelliptic curves. For more details on what follows 
the reader should consult the papers by Cantor [2] and Koblitz [12]. In this 
section we shall work with arbitrary characteristic fields. 

A hyperelliptic curve of genus g, over a field k of characteristic p, we will 
assume is given by an equation of the form 

+ H{X)Y = F{X), 

where H{X),F{X) e k[X], degi?(A) < g and degF(A) = 2g+\. For appli- 
cations it is common to assume that either p is very large or equal to two. If 
p is large we usually assume that H{X) = 0. Notice that in characteristic two 
the ramified places lying above p{X) G k[X] are exactly those for which p{X) 
divides H{X). 

The group elements, upon which our cryptographic protocols operate, are 
effective reduced divisors of degree less than or equal to g. Such a divisor can be 
represented by the pair 

D={a{X)MX)), 

where a{X),b{X) G k[X], deg 6(A) < dega(A) < g, a{X) is monic and 

b{Xy + H{X)b{X) - F{X) = 0 (moda(A)). 

The zero in the group is represented by the pair (1, 0). That the divisor is reduced 
means that no ramified place occurs in the support of D with multiplicity greater 
than one, and that if a place p occurs in the support of D then the image of 
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p under the hyperelliptic involution does not. In many protocols one needs to 
transmit divisors, naively this requires at most g elements of k to represent a{X) 
and at most g elements of k to represent b{X). 

However, given a{X) there are only a small number of possible values for 
b{X) which could correspond to a{X). We shall show how one can recover the 
correct b{X) from only a{X), and at most an additional g bits of information. 

Our first task is to decide a canonical order on the irreducible polynomials of 
degree less than or equal to g, which are defined over k. This is done by fixing a 
field representation and using the lexicographic order used for a similar purpose 
in Section 2. 

When we are either compressing or decompressing we first factorize a{X) 
into its irreducible factors and order them. Since factorisation of polynomials 
can be performed in random polynomial time, and in applications the degree of 
a(X) will be quite small (usually less than four) this factorisation stage is no 
barrier to our method. 

For example when g = 2 we need to factorize a degree two polynomial. 
This factors either when a certain trace is zero, for the even characteristic case, 
or when the discriminant is a square, for the odd characteristic case. In either 
characteristic we can easily deduce the factorisation when the polynomial is 
reducible using standard techniques for solving quadratic equations over finite 
fields. Similar considerations apply when g = 3. 

Each irreducible factor p{X) of a{X) will correspond to at most two prime 
divisors on H: 

Dp = {p{X),q{X)) and D'^ = {p{X), -q{X) - H{X) (mod p{X))), 
where q{X) is the polynomial of least degree such that 

q{Xf + H{X)q{X) - F{X) 

is divisible by p. Since the divisor we are compressing or decompressing is reduced 
we know that only one of these two possibilities is in the support of D. Hence, 
for each prime divisor of a{X) we need only specify one bit of information to 
determine whether Dp or D'p is in the support of D. The only questions remaining 
are how to produce this bit and how to recover the correct value of b{X), given 
a{X) and the resulting bits. 

3.1 Compression 

The basic idea is to execute the following steps for every distinct irreducible 
factor p{X) of a{X), this gives the bits Pp. 

1. If p{X) is ramified in k{H) set Pp = 0. 

2. If the characteristic of k is odd, and so H{X) = 0, then let Pp denote the 

parity of the smallest non-zero coefficient of b{X) (mod p{X)). 

3. If the characteristic of k is even then we set 



t{X) = b{X)/H{X) (modp(X)), 
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notice that the inversion of H{X) modulo p{X) can be accomplished since 
p{X) is unramified and so gcd{p{X), H{X)) = 1, We then let (3p denote the 
least significant bit of the constant term of t{X). 

Hence, the compressed form of the divisor D is {a(X), s} where s is the bit string 
containing the j3p for each irreducible factor of a{X). The bit string is ordered 
with respect to the ordering on the distinct irreducible factors of a(X). 

3.2 Decompression 

Suppose p{X)^ exactly divides a{X), then if we can recover b{X) modulo p{X)^ 
for all irreducible factors p{X) of a{X) we can then recover b{X) either via the 
Chinese Remainder Theorem or by adding together the local components for 
each prime p{X). 

Since (a(X), b{X)) is a reduced divisor, we know that \ip{X) is ramified then 
the value of k above is one, and recovering b{X) modulo p{X) is trivial, since it 
will be equal to zero modulo p{X). 

We now turn to the case where p{X) is not ramified. Then recovering b{X) 
modulo p(X)^, is trivially done once we know b{X) (mod p{X)). This recovery 
of b{X) modulo p{X)^ from h{X) (mod p{X)) can be accomplished in one of 
two ways: 

1. Using Hensel’s Lemma. 

2. By multiplying the divisor {p{X),b{X) (mod p{X))) by k. 

So we have reduced the decompression problem to determining the value of 

b{X) (modp(X)) 



given p{X) and the bit Pp. 

Since p{X) is irreducible, the algebra k[X]/p{X) is a field and we can apply 
well known techniques to solve quadratic equations in a field to determine a can- 
didate value b{X) for b{X) (mod p{X)). To check whether b{X) is the correct 
value we compute the value of the bit Pp, as in the compression algorithm, as- 
suming that b{X) is correct. If this value agrees with the supplied value then we 
know that b{X) = b{X) (mod p{X)), otherwise we set b{X) = —b{X) — H{X) 
(mod a(X)). 

Finally, note that the above algorithms for divisor compression and decom- 
pression are only slightly more complicated than those used in the elliptic curve 



case. 
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Abstract. SC2000 is a 128-bit block cipher with key length of 128, 
192 or 256 bits, developed by Fnjitsu Laboratories LTD. For 128-bit 
keys, SC2000 consists of 6.5 rounds, and for 192- and 256-bit keys it 
consists of 7.5 ronnds. In this paper we demonstrate two different 3.5- 
round differential characteristics that hold with probabilities and 

2-107 'pjjggg characteristics can be nsed to extract np to 32 bits of the 
first and last ronnd keys in a 4.5-ronnd variant of SC2000. 



1 Introduction 

SC2000 [1,5] is a 128-bit block cipher designed by Fujitsu Laboratories LTD, 
and accepts keys of 128, 192 and 256 bits. The cipher has been submitted as a 
candidate for the Nessie project [2], and was presented at the Nessie workshop 
in Leuven in November 2000, and at FSE2001 in Yokohama in April 2001. In 
the submission the designers analysed SC2000 against differential cryptanalysis 
[3], and gave lower bounds on the complexities of a differential attack based 
on characteristics. However, this search for differential characteristics does not 
necessarily reveal those with the highest probabilities. We found two different 
characteristics over 3.5 rounds, which can be used to extract 32 of the bit s in 
both the first and the last round key in a 4.5-round variant (the definition of 
a half round will become clear below). These characteristics have probabilities 
2-106 and 2-101’. 

The paper is organised as follows. In Section 2 we give a brief description of 
the SC2000 algorithm. In Section 3 we give the best characteristic we found for 
the most complicated part of the Feistel round function, used in the cipher round 
function. In Section 4 we create the different characteristics based on the findings 
of Section 3. In Section 5 we extract bits from the first and last round keys by 
using these characteristics, and we conclude in Section 6 with some remarks on 
the design of SC2000. 



* This work was supported by the European Union fund IST-1999-12324 - Nessie. 
The information in this document is provided as is, and no warranty is given or 
implied that the information is fit for any particular purpose. The user thereof uses 
the information at its sole risk and liability. 
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2 Description of SC2000 

The plaintext block in SC2000 is broken into four 32-bit words. The plaintext 
words are first XORed with a round key, and then passed through a layer of 32 
parallel d-bit S-boxes. We will call one of these d-bit S-boxes Sd in this paper. 
The input to Sd are the bits that are in the same position in each of the words, 
see Fig. 1. After this the block is XORed with another round key. The XOR with 
a round key, the 32 executions of Sd, and the XOR with a different round key 
is what we call one half round. The block is now broken into halves, and passed 







Fig. 1. How the 4-bit S-box works on the bits in position j. 



through a two-round Feistel network. The round function in the Feistel network 
is depicted in Fig. 2. Each of the two words that are sent into the round function 
are first passed through a layer of two 6-bit and four 5-bit S-boxes, called S6 and 
S5 respectively. To create diffusion, each word is then regarded as a vector of 
length 32 over GF(2), and pre-multiplied with M, a 32x32 matrix over GF(2). 
Let the two words of output from the multiplication of M be oi and 02. These 
words are now mixed in a linear function to create the two words of output from 
the Feistel round function, bi and 621 as follows. 

61 = (oi A m) 0 02 

62 = (o2 A m) 0 oi 

m is a 32-bit constant, m its bitwise complement, and the A denotes the logical 
AND operation. bi and 62 are now XORed onto the two words in the other half, 
and the halves are swapped. The other half is then passed through the Feistel 
round function and XORed onto the first half, but there is no swap after the 
second Feistel round. This concludes one round of SG2000. For 128-bit keys the 
cipher consists of six full rounds, plus the first half of the seventh round. For 
192- and 256-bit keys the cipher consists of seven full rounds, plus the first half 
of the eighth round. The constants m used in each round are 55555555a, in the 
odd numbered rounds and 33333333a, in the even numbered rounds. One round 
of the cipher is shown in Fig. 3. 

We omit the details of the key scheduling. The key schedule in SG2000 is 
quite complex, and our attack does not depend on how the key schedule works. 
We note however that the key schedule appears to be very strong, the knowledge 
of one round key does not seem to leak any information about any other round 
key, or about the key selected by the user. 
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a b c d 




Fig. 2. The Feistel round function. 



a b 



c d 



0 -^ Kq Kj ©"^ K2 ©"^ K3 




Fig. 3. The round function of SC2000. 



3 Searching for Differential Characteristics 

In [1] and [5] the designers have performed some differential cryptanalysis of 
SC2000. However, as shown in the sequel, the designers’ search for characteristics 
was not sufficient, and several differentials exist with probabilities exceeding the 
bounds of the designers. 

In order to explain how we found the differential used in our attack, we first 
define the support of an n-bit string w = {w\,W 2 , ■ ■ ■ ,Wn), written to be 

the set of coordinates where w has a non-zero value. 



X{w) = {i\w^ yf 0} 
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We concentrated on the two first components of the F-function (see Fig. 2), 
namely the layer of S5’s and S6’s, and the multiplication with M. The F-function 
takes two 32-bit words as input, but they are not mixed with each other until 
after these two steps are executed, so we focused on only one of the input words. 
The idea was to find a differential S with low Hamming weight that mapped to a 
differential e after the S-boxes and the M-multiplication, such that x(e) C 
To help us with this we first computed the two differential distribution tables 
for S5 and S6. 

We searched through all 32-bit words of Hamming weight six or less, and 
for each word we did the following. The word (or differential) e was assumed 
to be the output of the M-multiplication. Since this is a linear component, we 
multiplied e with M~^ to find the a that would map to e through M. By looking 
up in the two distribution tables, we then checked whether e could be mapped 
to a through the layer of the S5 and S6 S-boxes. 

We found 11 differentials of Hamming weight five, which only had four of the 
six S-boxes active and were mapped to themselves with some non-zero probabil- 
ity. None of the e’s of weight four or less could not be mapped to themselves, but 
for each of them we checked how many of the S-boxes that failed to do the re- 
quired mapping from e to a. We found one e of weight two, namely e = 40200000a, 
that mapped to the corresponding a = /7d30017a, in four of the six S-boxes. By 
adding a 1-bit in the differences going into the two remaining S-boxes, we were 
able to produce i5 = i5o = 40220001a, of weight four that is mapped to a with 
probability 2“^® (The probabilities are 2“® for the two S6’s, and 2~'^ for the two 
active S5’s). In fact, there are eight different S’s of weight four that can map to 
e. In one of the two S-boxes that require a non-zero difference we can add the 

1- bit in two different ways, and in the other S-box we can add the 1-bit in four 
different ways. The seven other S's are 

= 40220004a,, (ia = 40220010a,, <5a = 40220020a,, ^4 = 40300001a,, 

S5 = 40300004a,, (ie = 40300010a,, 67 = 40300020a, 

Now the idea was to send the differentials 0 and S into the F-function, and 
have 6 map to e. In the third and last part of the F-function, the AND operation 
with the fixed masks does not effect the 0 difference. The AND operation applied 
to e will turn e into ea = 00200000a, when the mask 33333333a, or aaaaaaaax is 
used, and turn e into €4 = 40000000 when the mask 55555555a, or ccccccccx is 
used. Finally, e will be XORed onto the 0 difference and 0 will be XORed onto the 
difference that is either ea or €4. In total we have the differential characteristics 
(5,0) (e4)£) and (0,5) (e,ea) in a round where the mask 55555555a, is 

used, and the differential characteristics (5,0) — > (ea,e) and (0,5) — > (e, €4) 
when the mask 33333333a, is used. Each of these characteristics have probability 

2 - 1 ®. 
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4 Building the One-Round Characteristics 

Let us now see how we can use S and e to create a differential characteristic 
through several rounds of SC2000. 

4.1 Two One-Round Differential Characteristics 

The first one-round characteristic, explained below, is shown in Figure 4, where 
we have omitted the additions of round keys since they do not affect the analy- 
sis. s Let (S, 0, 0, 0) be the difference in the blocks before the two Feistel rounds. 



6 0 0 0 




0 5 0 0 



Fig. 4. A one-round differential characteristic with probability 2 



First the two rightmost words are sent through the F"-function. They have dif- 
ference (0,0), so the output will have difference (0,0) with probability 1. This 
(0, 0)-difference is XORed onto the left half, and the halves are swapped so the 
difference before the second Feistel round is (0,0, (5,0). The right halves with 
difference (5,0) are then sent into the F-function, and with probability 2“^® the 
difference after multiplication with M will be (e, 0). After this e will meet one of 
the masks 55555555a, or 33333333a;, so the output of F will be (c 2 ,e) or (e 4 ,e). 
These outputs are XORed onto the left halves, and since there is no swap, the 
difference of the blocks becomes (e2,e,5, 0) or (e4,e,5, 0) before the S4 layer. 

Since 6 has weight four and x(ci) C x(e) C x(5), there will only be four 
active S-boxes in the layer of the 32 S4’s. Two of them will have input differ- 
ence 2a;, one will have input difference 6a,, and one will have input difference 
Ca;. All the differences 2a,, 6a, and Ca, can go to the difference 4a, through S4, 
each with probability 2“®. So with probability (2“®)"^ = we get the charac- 
teristic (ei,e, 5, 0) — > (0,5, 0,0) through S4. All together, we get the following 
characteristic with probability 2“^® • = 2“®°. 

(5, 0,0,0) (0,5, 0,0) 

The other useful one-round characteristic is the one that starts with the 
difference (0,5, 0,0). After the first Feistel round with the swap the difference 
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becomes (0, 0, 0, 6). With probability 2“^® the right half difference (0, S) becomes 
(0, e) after multiplication with M. The output difference of F will then be (e, £ 2 ) 
or (e, £ 4 ) and after the XOR with the left halves the difference will be (c, £i, 0, 5). 
Two of the input differences to S4 will now be one of them will be 9^; and 
one will be dx- The lx and dx differences can lead to the difference 8 ^; with 
probability 2“®, and the 9x difference goes to 8 a; with probability 2“^. This 
gives us the following one-round characteristic with probability 2 “^®. 

(0,<5,0,0) (<5, 0,0,0) 

4.2 Concatenating the One-Round Characteristics 

The differential characteristics above start and end just before the Feistel rounds. 
The cipher itself begins with the application of the S4 layer, but the characteris- 
tics we build by concatenating the one-round characteristics will start after the 
first half of the first round. The next section explains how to use these charac- 
teristics. 

The characteristic (<5, 0,0, 0) — > (0,5, 0,0) can be concatenated with 

(0,5, 0,0) ^ (5, 0,0,0). By doing this, we get the following differential 

characteristic through three and a half rounds with probability 2 “^*^^. 



(5,0,0, 0) (0, 5,0,0) (5,0, 0,0) (0,5,0,0) ^ (£,£4,5,0) 

The other characteristic is the one starting with input (0,5, 0,0). 

(0, 5,0,0) (5,0,0,0) (0,5,0,0) (5,0,0,0) ^ (£ 2 , £,5,0) 



This characteristic has probability 2 



-106 



5 Extracting Bits from the First and Last Round Key 

In this section we will explain how to extract up to 32 bits from both the first 
and last round key in a 4.5-round variant of SC2000. 

5.1 How to Find 16 Key Bits of the First and Last Round Key 

The characteristics in the previous section do not start with the plaintext differ- 
ence, but with the difference after the first S4 layer. To use these characteristics, 
we create structures S of 2^® plaintexts as follows. Fix the bits going into the 28 
S4’s that are not affected by 5, and let the 16 bits going into the S4’s determined 
by 5 take on all 2^® values. Let Ag, = (5, 0, 0, 0), Z \4 = (0, 5, 0, 0), = (c, £ 4 , 0, 5) 

and 172 = (c 2 , e, 5, 0). For each plaintext P G S, the plaintext P 0 is also in 
S, for i = 4,8. In other words, of the ) « 2®^ pairs in S there are 2^® pairs 
with difference Z \4 and 2^® pairs with difference Ag. Encrypting the plaintexts 
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in S through the first S4 layer does not change the structure in any essential 
way. The 112 fixed bits remain fixed, and the other 16 bits range over all 2^® 
values. So there will be 2^® pairs in S that have difference Z\4, and 2^® pairs 
that have difference after encryption through the first S4 layer. A randomly 
chosen input difference to one S4 will have the possibility to go to the output 
difference with probability 1/2, so the probability that a randomly chosen 
pair of texts from S will have the possibility of having difference after S4 
is 2“^. By the same argument, the probability that a pair of texts from S can 
have the difference A^ after S4 is approximately 2~^. In total, the probability 
that a randomly chosen pair of texts from U has difference A 4 or Ag after S4 is 

2-3. 

We call a pair of plaintexts that follows either of the two characteristics from 
Section 4 a right pair, and a pair of plaintexts that does not follow any of these 
characteristics a wrong pair. 

The probability that a structure contains a right pair is 2^^ ■ + = 

3 • 2“®^. After encrypting 2®3 structures, we expect to have 2®3 . 3 . 2“®® = 6 right 
pairs among the 2®3 . 2®® = 2^'^'^ pairs we get from the structures. We filter out 
most of the wrong pairs as follows. 

Find potential good pairs by inserting the 2^® ciphertexts from one structure 
in a hash-table according to 20 bits in the first word (see [4]). The ciphertexts in 
a right pair will be inserted in the same position in the table. If a pair is a right 
pair, all of the 112 bits corresponding to the inactive S4’s must be equal. This 
gives a filtering factor of 2~^^'^. If a pair of ciphertexts are equal in these 112 bits, 
check the differences in the four S4’s corresponding to S. If the pair is a right 
pair, it must be possible that the output difference from the last S4-layer has 
had input differences l7i or f?2- As explained above, a random pair passes this 
test with probability 2“®. If a ciphertext pair is a right pair, and had difference 

before the last S4, then the pair of plaintexts must have had the possibility 
to get the difference Ag after the initial S4. A random pair passes this test with 
probability 2“®. Likewise, a right pair that has difference 172 before the last S4 
must have had difference Z\4 after the first S4, and the probability that this 
holds for a random pair is 2“®. 

With these steps we have a filtering factor of 2“®^®. After using this filtering 
procedure on the 2®®® different pairs we expect to be left with 2®®® • 2“^®® = 32 
pairs, among which we expect six right pairs. 

The main part of the work to generate 16 potentially right pairs comes from 
the 2^®® encryptions required. The memory requirements to get the 16 potentially 
good pairs is small. In addition to the potential right pairs, we only need to hold 
2^® plaintexts and the corresponding 2®® ciphertexts in memory at the same 
time. 

The rest of the attack follows along the lines of a standard differential attack. 
For each of the ciphertexts in the 16 potentially right pairs, guess on the 16 key 
bits from the last round key corresponding to the active S-boxes. For each guess, 
decrypt the ciphertext bits in the active S4’s, if the decrypted values have one of 
the input differences or 1?2, suggest these bits as part of the last round key. 
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The correct value will be suggested for each right pair. For each Qi, there will 
be 2 - 4 d-bit values suggested for each S-box, and the values suggested by each 
S-box can be combined in 2"^ - 4"^ different ways. Since we accept both and 1?2 
as input difference, we will get 32 - 512 suggestions for the 16 key bits. The right 
value will be suggested for every right pair, i.e. six times. The suggestions of the 
wrong values are expected to be distributed more or less uniformly over the 2 ^® 
different values, so it is highly unlikely that any wrong value will be suggested 
six times. Take the most suggested value as the correct bits in the last round 
key. 

We find 16 bits of the first round key in the same manner. Guess on the 16 bits 
corresponding to the active S4’s in the first round, and for each guess, encrypt 
each pair of plaintexts through the active S4’s. If a pair gets the difference 
or Z\g, suggest the value as part of the first round key. Again there will be 32 - 
512 suggestions for every pair. We expect the correct value to be suggested six 
times, and the incorrect values to be more or less uniformly distributed over the 
2^® values. Again we take the most suggested value as the correct one. 

5.2 How to Find Another 16 Bits 

We can repeat the attack described above for a different 5, say for Js = 40300004a, , 
to find 16 bits of the first and last round keys. Among the key bits we will find, 
eight of them will be the same as we found using Sq, because the two active 
S-boxes defined by e will overlap in both attacks. So repeating the attack with 
^5 will only yield eight new bits in the two round keys. After this we have found 
24 bits of each key. The last eight bits we can get are the ones that correspond to 
the S4’s defined by OOOOOOlOa, and 00000020a,. They can be found by repeating 
the attack with J’s using these S-boxes, like S 2 = 40220010a, and ^3 = 40220020a,. 
Repeating the attack four times gives an overall complexity of 2^^^. 

6 Conclusions 

For a 4.5 round variant of SC2000, we have shown how to find 32 bits of both 
the first and the last round key, using 2^^^ chosen plaintexts. The strong key 
schedule in SC2000 prevents us from actually breaking 4.5 rounds by searching 
exhaustively for the remaining 96 bits in the first or last round key, since we can 
not easily deduce the other round keys from them. 

This paper may teach us a different lesson, though. Several places in [1], the 
designers hint that SC2000 can be thought of as an advanced Feistel cipher. The 
layer of 4-bit S-boxes between every other Feistel round can be regarded as a 
cryptographically stronger component than the swap of halves found in ordinary 
Feistel ciphers. This S-box layer certainly gives better confusion than a simple 
swap, but it introduces another weakness not found in regular Feistel ciphers. 

It was shown in [5] that in SC2000 it is possible to have a differential char- 
acteristic that feeds every other Feistel round with a 0-difference. This is not 
possible in a regular Feistel cipher. In this paper we have extended the search 
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done in [5] to two-round iterative characteristics. This resulted in characteristics 
with higher probabilities than what was found in [5]. 

Having an S-box layer instead of a swap between some rounds might be a 
good idea, but one should be careful to make sure that any cryptographically 
good property of the swap is not lost when replacing it. The designers state that 
one of the design criteria for S4 is that except for the all-zero difference, an input 
difference (ao; cti, 0, 0) can not lead to an output difference (/3o, /3i, 0, 0), and an 
input difference (0, 0 ,a 2 7 CT 3 ) can not lead to an output difference (0, 0, /? 2 , /ds). 
This is to make sure that there is some form of “swap” involved when going 
through S4. However, one should also demand that if ol and an are two non- 
zero 2-bit values, then the input difference (a^, an) will always lead to an output 
difference {I3l,Pr) where both Pl and Pa are non-zero. 
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Abstract. Linear cryptanalysis remains the most powerful attack 
against DES at this time. Given 2^® known plaintext-ciphertext pairs, 
Matsui expected a complexity of less than 2^® DES evaluations in 85 
% of the cases for recovering the key. In this paper, we present a the- 
oretical and experimental complexity analysis of this attack, which has 
been simulated 21 times using the idle time of several computers. The 
experimental results suggest a complexity upper-bounded by 2^^ DES 
evaluations in 85 % of the case, while more than the half of the ex- 
periments needed less than 2^® DES evaluations. In addition, we give a 
detailed theoretical analysis of the attack complexity. 

Keywords: linear cryptanalysis, DES 



1 Introduction 

Linear cryptanalysis against DES [10] has been introduced by Matsui [6,7] and 
remains at this time the most powerful attack against this cipher. A single exper- 
imental implementation [7] has been carried out. During this attempt, Matsui 
managed to break a DES key in about 50 days on 12 powerful computers, the 
plaintext-ciphertext pairs generation lasting 40 days and the exhaustive search 
for the remaining unknown bits taking the last 10 days. It was noticed that the 
second phase performed faster than one could expect theoretically. 

Although several authors have studied, generalized and applied the linear crypt- 
analysis concept in several ways, little work concerning its success probability 
and its complexity has been done, and while it is widely accepted that linear 
cryptanalysis of DES, given 2^^ known plaintext-ciphertext pairs, has a success 
probability of 85 % within a complexity of 2 ^^ DES evaluations, it was conjec- 
tured that this value is pessimistic [9,3]. 

Motivated by this fact, by the parallel implementation concept of Biham [1] and 
the actual 64-bit processor performances, we propose in this paper a theoretical 
and experimental complexity analysis. By using a fast DES routine implemented 
for the Intel MMX architecture, the production part of the attack has been run 
several time, virtually breaking a total of 21 keys. 

This paper is organized as follows: in §2, we recall some theoretical background 
on the attack. In §3, we describe briefly the design of the fast DES routine and 
the attack implementation. In §4, we discuss and complete the success probabil- 
ity and complexity model. In §5, we discuss some issues on the linear expression 
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biases, the piling-up approximation and the wrong-key randomization hypothe- 
sis, comparing the known theoretical results to our experimental ones and finally 
we give in §6 our experimental results. 



2 Matsui’s Attack 

In this paper, we deal with the improved attack [7] proposed by Matsui against 
DES. The attack’s core is unbalanced linear expressions, i.e. equations involving 
a modulo two sum of plaintext and ciphertext bits on the left and a modulo two 
sum of key bits on the right. Such an expression is unbalanced if it is satisfied with 
probability ^ p = ^ + with 0 < e < | and k G {—1, 1} when the plaintexts and 
the key are independent and chosen uniformly at random and where k depends 
on the key value. 

Given some plaintext bits Pi^, , Pi^ , ciphertext bits Cj^ , ... , Cj^ and key bits 
Kfcj, . . . , Kfcj, and using the notation 0 0 . . . 0 X;^ = X[;^_ we can 

write a linear expression C as 

^ ■ P[ii,...,ir] ® (^) 

Matsui’s improved attack operates on 14 rounds using two biased linear expres- 
sions which collect statistical information on 26 bits out of the first and last 
round subkeys. The remaining 30 unknown key bits have to be searched exhaus- 
tively. The linear expression (1) involves thus two terms of E-function and can 
be rewritten as 



C : ® C[0.....,0 ® (P, K(D) 0 

= (2) 

where ^ j (P, is the modulo two sum of some bits resulting from the 

E-function output in the first round and is the subkey of round 1. A similar 
notation is used for the last E-function. 

The attack main idea is related to the following assumption: 

Assumption 1 (Wrong- key randomization hypothesis [3]). For any lin- 
ear expression C operating on n rounds for which 



Pr 


£ = 0| = A:(i),...,K®) = A;®) 


1 






~ 2 



is large for virtually all values . . . , A:®) of the round keys, the following is 
true: for virtually all possible full keys {k^^\ . . . , A:®)) and for all estimates k of 

^ In the literature, this non-linearity measure is often called linear probability, and 
expressed as LP^(a,fe) = (2Pr[a- x = b- f{x)] — 1)^, where a and b are the masks 
selecting the plaintext and ciphertext bits, respectively. In this paper, we will refer 
to the bias t for simplicity reasons. 
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the last round key, 



Pr[ 


£ = 0 1 K = 


1 1 
J 2 


Pr 


C = 0\K = k 


1 

2 



yk^kr 



(3) 



where kr is the right key. 

Intuitively, the decryption of the first and the last round with wrong subkey can- 
didates can be considered as two rounds more of encryption. Thus, the plaintext 
and the ciphertext will be less dependent, and the linear expressions less biased. 
The first linear cryptanalysis phase (see Fig. 1) consists in evaluating the bias 
of both linear expressions for all possible subkey candidates and for all known 
plaintext-ciphertext pairs. In a second phase (Fig. 2), the two lists of subkey 
candidates corresponding each to a linear expression are sorted in a maximum- 
likelihood manner, combined, and the missing bits are finally searched exhaus- 
tively for each pair of subkey candidate until the right key is found. 

The complexity C of the attack is then related to the number of needed DES 
encryptions in the exhaustive search part while its success probability Vc within 
a given complexity C is also related to the success while guessing the right part 
of both linear expressions. 



1: N = number of known plaintext-ciphertext pairs at disposal. 

2: for linear expressions C\ and C 2 do 
3: for all subkey candidates ki,l < i < 2^^ do 

4: Cj,. = number of times out of N where left part of (2) is equal to 0 when 

K = ki. 

5: end for 

6: end for 



Fig. 1. Matsni’s algorithm 2 [7] (phase 1) 



3 Implementation of the Attack 

The linear cryptanalysis attack against DES, except the exhaustive search part, 
has been implemented as described in [7]. After having determined the rank of 
the right subkey candidate in the final list, it is not difficult to compute ^ the 
expected complexity (in DES function evaluations) of the exhaustive search part: 

E[C] = (r - 1) • 2^° -b 2^® 

^ The strategy used to combine the two lists of 13-bit snbkey candidates is Matsui’s 
proposed one [7]: sort the pairs by increasing r = i ■ j (see lines 12-13 of Fig. 2), 
where i and j are the respective ranks in the 13-bit snbkey lists. 
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1 


for linear expressions Ci and C 2 do 


2 


Sort the Cj, . ’s by decreasing | ^ ~ Cffc. | s-nd rename them ,1 < j < 2^^. 


3 


for 1 < 7 < 2^^ do 


4 


/* K is defined in Sect. 2 (expected bias of £) */ 


5 


if (C* — ^) ft > 0 then 


6 


Guess = 0 


7 


else 


8 


Guess = 1 


9 


end if 


10 


end for 


11 


end for 


12 


Form 2^'* {C* , Cj)r pairs where r := i ■ j. 


13 


Sort them by increasing r and rename them Dk, 1 < fc < 2^^. 


14 


for 1 < fc < 2^'^ do 


15 


Fix the key bits given by Dk and search exhaustively the remaining 30 bits of 
K until the right key is found. 


16 


end for 



Fig. 2. Matsui’s algorithm 2 [7] (phase 2) 



where r is the rank in the list D of subkey candidates. The complexity’s estima- 
tion error has thus a maximal value of 2^® DES evaluations, which is negligible 
almost all the time. 

The computational most intensive part of the attack being data encryption, the 
involved DES routine speed is a key parameter regarding the time needed to 
process 2"^^ plaintexts. We have thus implemented a very fast DES routine using 
the bitslicing concept [1] and some attack-related optimizations. Our routine has 
been designed for the Intel MMX architecture which has eight 64-bit registers 
at disposal. Although this platform has several drawbacks regarding a bitsliced 
implementation [8], it has the advantage of being very common. 

Kwan’s gate representation of the S-boxes [5] builds the core of the implementa- 
tion, the other parts of the cipher (key schedule, permutations, ...) being hard- 
coded. By eliminating parts of the cipher unrelated to the attack and by using 
advanced optimization techniques like instruction pairing, prefetching of the data 
and code unrolling, we managed to get an encryption speed of 183 Mbps on an 
Intel Pentium III clocked at 666 MHz. This represents 232.7 clock cycles for 
encrypting one block of data. One can hardly compare this number with exist- 
ing good implementations because of the optimizations related to the attack; 
however, using classical available implementations for our purposes would have 
resulted in poorer performances. 

® A DES routine was implemented for similar purposes in [12] on other platforms; 
they report 62 Mbps on a Ultra SPARC 200 MHz and 336 Mbps on a Alpha 21164A 
500 MHz. The significant speed difference on the latter platform is due to the large 
number of available 64-bit registers (and thus to a lesser number of slow memory 
accesses) . 
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The attack has run 21 times, using the idle time of 8 to 16 computers; this 
represents between 3 and 6 days for a single run. 



4 Success Probability 

In this section, we address a general way to characterize the probability distri- 
bution of the rank of the right 13-bit subkey in the list of candidates given by a 
linear approximation C. 



4.1 Rank Probability 



As the complexity C of the attack is closely related to the rank of the right 
subkey in the candidates list, we address first the problem of estimating the 
rank distribution. 

Let Wi , . . . , Wn be n independent and identically distributed continuous random 
variables having fw(x) and Fw{x) as common density function and distribution 
function, respectively. Let i? be a continuous random variable independent of 
the Wi’s and having fn{x) and Ffi{x) as density and distribution function. Sort 
these n -|- 1 random variables in non-increasing order and rename them > 
Z( 2 ) > . . . > Finally, let S' be a discrete random variable taking values on 

{1, . . . , n-l- 1} which models the rank of R in the sorted list: F = ip = R. 

The distribution of F and its expected value are given by the following theorem, 
whose proof is given in Appendix A. 

Theorem 1. Under previous assumptions and for 1 < ip < n G N, the distribu- 
tion function of F is equal to 



/ +00 

Bn+i_^^^{Fw{x))fR{x)dx 

-oo 



and 



E [iF] = 1 -I- n ^1 - y fR{x)Fw{x)dx^ 



where 

Ba,b{x) = - t)^~^dt 

F{a)F{b) Jo 

is the incomplete beta function of order (a, b). 

In order to be able to compute the densities of the estimated biases^, we first 
have to make the following assumptions [13]; the two first ones are heuristic in 
nature, while the last one is motivated by the law of large numbers. Ck^ {Ck „ ) 
will denote a random variable modeling the counter value (as defined at line 4 
of Fig. 1) in the case of a right (wrong) subkey candidate and N is the number 
of known plaintext-ciphertext pairs. 



^ The mean and standard deviation of the counters and the respective biases of the 
linear expression being linearly related, we will use in the following the bias termi- 
nology. 
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Assumption 2. The bias 



B„ 



1 

2 N 



of a linear expression evaluated with wrong subkey candidates has a distribution 
independent of the key value. 



Assumption 3. The bias 



Bj. 



1 _cv 

2 N 



of a linear expression evaluated with the right subkey candidates has a distribu- 
tion independent of the distribution defined in Assumption 2 and independent of 
the key value. 



Assumption 4. The distributions of and are well approximated by a 
normal law. 



We denote in the following the normal law density with mean p, and variance 
by 4>(p^a^) and the corresponding cumulative distribution function by o-^)- 
Because the cryptanalyst ignores the linear expression’s right part, she is more 

interested in the absolute value of the biases. Noting that if A is a normal 

/ 2 \ 

law the density of F = |A — a|, a < ^ is given by /y Al/jo) = 

4>(p,<T^)iy + a) + 4 >(^i,(T‘ 2 ){a — j/) for 0 < j/ < +oo, the bias densities in case of 
wrong and right subkey candidates are respectively given by 



with 



fR{x) =/(^--?)(x,i) 



/Xr = E 



at = Var 



'Ck/ 




'Ck„' 


. ^ . 


= KCr Pw 





rcfc.i 


1 


at, = Var 




. ^ - 


AN 


W 





1 

2 + 

1 

^ AN 



( 4 ) 

( 5 ) 



where k G { — 1,+1} depends of the unknown key bits and (C'fe™) is the 
random variable modeling the value of the counter corresponding to the (a) 
right (wrong) subkey. Fig. 4 gives some numerical evaluations of Theorem 1 
for these densities while the following table gives the expected rank for various 
amounts of known plaintext-ciphertext pairs at disposal. Here, we assume that 
£r = 1.19 • is equal to the piling-up lemma approximation and that e^, = 0. 
We note that Theorem 1 gives exactly the same values as Matsui’s experimental 
computations [7] regarding the cumulative rank probability of the right subkey 
candidate. 
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N 


243 


242.5 


242 


241 


240 


E[i2/] 


71.3 


182.5 


361.9 


847.3 


1311.6 




Fig. 3. Rank distribution Pr [#" < ■)/)] for various amounts N of plaintext-ciphertext 
pairs. 



4.2 Success Probability 

The attack’s success probability Vc within a given complexity C is also dependent 
on the error probability while guessing the bit of information about 
Using the same assumptions as during the previous computations, it is easy to 
compute this error probability (in the case where k = -1-1 and = 0, the 

other ones being symmetric) . 



Pwg = Pr [“^[ki,...M] wrongly guessed”] = ^<(^,,^ 2 ) 




(6) 



The following table gives some numerical approximations for various N: 
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N 


243 


242.5 


242 


241 


240 


Pwg 


0.0004 


0.0023 


0.0086 


0.0462 


0.1170 



5 Experimental Linear Expressions Biases 

A key parameter regarding the linear cryptanalysis success is of course the bias 
of the involved linear expression(s). As it is infeasible to compute the exact bias 
of a linear expression, one uses implicit assumptions, such as the wrong- key ran- 
domization one and the independence of data between two successive rounds. 
The incidence of these assumptions has been well discussed in the literature 
[9, 2, 3, 4]. Although several situations where these assumptions can fail have been 
suggested and discussed, it is accepted that the linear expression real bias should 
be well approximated in case the of DES. 

The experimental results go in this direction. We have computed the sample 
means of the experimental biases Br and which can be compared to the 
expected values of densities (4) and (5). 

In case of right key, the sample mean is equal to 5.5 • 10“^ with a standard 
deviation of 0.2 • 10“^. This value has to be compared with the one given by 
the piling-up approximation and (5), E[Sr] = 5.674 • 10“^. As a first observa- 
tion, one can note that the linear hull effect [9] is not visible for DES, the mean 
experimental bias being not perceptibly greater than the piling-up lemma ap- 
proximation. 

Our experiments provide furthermore a good opportunity to confirm the va- 
lidity of Assumption 1. The sample mean in case of wrong subkey candidates, 
averaged over all the wrong subkeys and all experiments, is equal to 1.38 • 10“^ 
with a standard deviation of 0.03 • 10“^. This value has to be compared with 
= 1.345 • 10“^ given by e„, = 0 and (4). Obviously, as one could expect, 
the mean seems to be slightly greater than for a perfect cipher and thus the 
plaintext and ciphertext are still correlated. However, the bias values for the 
wrong candidates are not on the same scale as those for the right candidates, 
confirming the validity of Assumption 1 for DES. 



6 Experimental Results 

It is widely accepted that linear cryptanalysis of DES, given 2^^ known plaintext- 
ciphertext pairs, has a success probability of = ^5% within a complexity 
of = 2^^ DES encryptions, which are values given in [7]. Our experimental 
results suggest a lower complexity. 



6.1 Rank and Guessing Error Probabilities 

Each of the 21 experiments provides two statistical samples. Following table sum- 
marizes our results about the ranks of the right subkey candidates for various 
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N 


243 


242.5 


242 


241 


240 


tp < 5 


20 (22) 


13 (13.5) 


7 (7.6) 


0 (2.3) 


0 (0.8) 


< 10 


27 (25.8) 


16 (17.1) 


9 (10.5) 


2 (3.6) 


0 (1.3) 


< 50 


33 (33.6) 


26 (26.2) 


18 (18.8) 


5 (8.6) 


2 (3.9) 


i ) < 150 


38 (37.7) 


34 (32.3) 


24 (25.7) 


10 (l4.3) 


5 (7.7) 


i ) < 300 


42 (39.4) 


39 (35.7) 


31 (30.3) 


17 (19.2) 


14 (11.6) 


Ip < 600 


42 (40.8) 


40 (38.5) 


35 (34.6) 


25 (24.7) 


22 (16.8) 


E[V>] 


38 (71) 


129 (l82) 


302 (362) 


654 (847) 


1121 (1312) 



amounts N of known plaintext-ciphertext pairs and compare them to the the- 
oretical expectations (values in smaller characters) given by Theorem 1. We 
observe that Theorem 1 seems to give a pessimistic rank expected value. It is 
difficult to explain this fact because of the small statistical sample size. Fur- 
thermore, we have noticed that Theorem 1 is very sensitive numerically. For 
instance, the expected rank E[!F] is equal to 113 and to 39 when we assume that 
6r = 1.1 • 2“^^ and = 1-3 • 2“^^, respectively. 

The experimental results regarding the remaining bit guessing error probability 
are summarized in the following table. The number n^g of cases where the guess- 
ing phase was unsuccessful is reported, together with the theoretical expected 
values given by (6) which are given in smaller characters. One can see that (6) is 



N 


243 


242.5 


242 


241 


240 


'fT'wg 


0 (0.02) 


0 (0.10) 


0 (0.36) 


0 (1.94) 


1 (4.91) 



a bit pessimistic, which can be explained a new time by the arguments developed 
below. We note furthermore that the success probability Vc of the linear crypt- 
analysis of DES within a given complexity C seems not to be so dependent on 
the guessed bit of information about the key and that the key factor regarding 
Vc is the given upper bound C. 



6.2 Complexity of the Attack 

An exhaustive table of our experimental results regarding the complexity is 
given in Appendix B. Key facts (mean, median, maximal and minimal C) are 
summarized in the following table where a value of x means 2“ DES evaluations: 
Our experimental results lead to the following observations: 

— Given 2"^^ known plaintext-ciphertext pairs, our experiments have a com- 
plexity of less than 2'^^ DES evaluation with a success probability of 86 
% where more than the half of the cases have a complexity less than 2^®. 
Furthermore, if an attacker is ready to decrease her success probability, the 
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N 


243 


242.5 


242 


241 


240 


Me 


41.4144 


47.1516 


48.9504 


50.2121 


51.4154 


Cmed 


38.1267 


41.8023 


44.2949 


48.5492 


51.0533 


Cmin 


32.1699 


29.0000 


36.5157 


43.8552 


41.9750 


c 

^max 


45.4059 


51.2973 


52.3671 


52.1953 


53.1000 



complexity drops dramatically (less than 2^^ DES evaluations with a success 
probability of 10 %). 

— Given 2^^-® known plaintext-ciphertext pairs (i.e. with 30 % less pairs), half 
of the experiments have a complexity less than 2^^ DES evaluations. 

— With only 2^^ pairs at disposal, the complexity is far lower than an exhaus- 
tive search. 

Even if we have to take these experimental results carefully because of the rela- 
tive small number of statistical samples, they suggest strongly a lower complexity 
than expected by Matsui in [7] and we risk the following conjecture: 

Proposition 1. Given 2"*^ known plaintext- ciphertext pairs, it is possible to re- 
cover a DES key using Matsui’ s linear cryptanalysis within a complexity of 2“^^ 
DES evaluations with a success probability of 85 %. 



7 Conclusion 

The first goal of this research was to perform an experimental linear cryptanalysis 
of DES as many times as possible in order to get a better insight into the real 
complexity and success probability of this attack. Using a very fast DES function 
developed for the Intel MMX architecture, we have simulated Matsui’s attack 
21 times. 

Our experimental results suggest a lower complexity than estimated by Matsui. 
Given 2'^^ known plaintext-ciphertext pairs, the complexity was upper-bounded 
by 2"*^ DES evaluations with a success probability of 85 %. This has to be 
compared with the estimated 2^^. 

We give furthermore a detailed theoretical analysis of the rank probability of the 
right subkey in the list of candidates, confirming Matsui’s experimental results, 
and we discuss the validity of our theoretical model towards the experimental 
results, together with several issues regarding past research. 
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A Proof of Theorem 1 

As a first step, let’s consider the following situation: let Wi,W 2 , ■ ■ ■ , be n 
independent and identically distributed continuous random variables having fw 
as density function and Fw as distribution function. We arrange the values of 
Wi,W 2 , . . . , Wn in strictly® increasing order and denote them by W(i) < W( 2 ) < 

. . . < W(„) . The distribution function Fw^^) of is given by the following 
Lemma whose proof can be found in [11]. 

Lemma 1. Under previous assumptions, the distribution function of the i-th 
smallest random variable is 

(x) — (T’(x)) 

The probability that equal values occur is 0. 



5 
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where 

= wifi *“^'(1-*)“'^* 

is the incomplete beta function of order (a, b). 

By using the previous Lemma and the independence between the involved ran- 
dom variables, we can compute F^{x) as follows: 



Pr [F < if] = Pr < R] 



+00 y 



j j fw^^){x)fR{y)dxdy 

Bn+i-^,^ {Fw{y)) fniy) dy 



— oo — oo 
/•+00 



By definition, we have 

n+l 

E [iF] = ■0 • Pr = ^] 

'll)—! 

n+l 

= Pr [iF = 1] + 'll; (Pr [^ < 'tp] — Ft [^ < ip — 1]) 
■0^2 
n 

= n -I- 1 — Pr + < i/)] 

0 — 1 



where 



Pr [F < Ip] 

0—1 




{Fw{y)) fniy) dy 




fniy) E -^n+1— 0,0 {^w{y)) dy 

0—1 



It is easy to see that 



E {Fw{y)) 

0 — 1 







rFw(y) 



dt 



nFw{y) 
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and we can thus conclude with 



/ +00 " 

fniy) X! (Fwiy)) dy 

ijj=i 

/ + 00 

fR{y)Fw{y) dy 

-oo 

fR{y)Fw{y)dy 



= 1 + n 1 — 



' — oo 
^+oo 



B Complete Experimental Results 

This table gives the experimental results regarding the complexity C of each run 
of the attack for various amounts of plaintext-ciphertext pairs. 



Exp 


N = 2^® 


N = 2"^^-® 


N = 2^2 


N = 2"^^ 


N = 2"^° 


1 


39.1836 


38.4818 


45.0307 


51.3802 


51.0533 


2 


33.2479 


41.6346 


43.6383 


48.0928 


43.1913 


3 


38.6055 


41.8023 


43.9622 


48.5492 


51.6012 


4 


38.1267 


34.6147 


41.3351 


48.7240 


51.2041 


5 


37.4878 


29.0000 


36.5157 


46.1991 


52.3685 


6 


34.0444 


44.2753 


46.6834 


48.5221 


50.1937 


7 


36.4676 


45.5732 


44.2949 


47.3010 


51.2913 


8 


36.1189 


44.7722 


41.4091 


51.6338 


52.1143 


9 


40.3515 


47.0565 


48.6184 


52.1953 


53.1000 


10 


41.6540 


41.8682 


45.7429 


47.9120 


41.9750 


11 


45.4059 


51.2973 


51.9932 


51.8155 


52.1972 


12 


36.1189 


43.6633 


46.7256 


50.3949 


49.2317 


13 


36.4009 


36.1189 


43.2183 


47.0756 


46.7680 


14 


39.0042 


42.6736 


44.3057 


44.7116 


47.3256 


15 


37.6330 


39.8572 


47.6536 


49.5244 


52.6439 


16 


38.9204 


36.6653 


41.5447 


49.1082 


49.9939 


17 


33.5236 


38.8502 


43.3128 


46.1030 


48.6798 


18 


39.8478 


47.4938 


52.3671 


50.6770 


50.3675 


19 


32.1699 


31.8074 


40.5093 


43.8552 


48.4968 


20 


40.7503 


38.3729 


40.3734 


45.2436 


52.3101 


21 


41.8721 


44.9063 


45.4147 


52.0730 


52.8571 
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Abstract. This paper extends the analysis of Pollard’s rho algorithm 
for solving a single instance of the discrete logarithm problem in a finite 
cyclic group G to the case of solving more than one instance of the 
discrete logarithm problem in the same group G. We analyze Pollard’s 
rho algorithm when used to iteratively solve all the instances. We also 
analyze the situation when the goal is to solve any one of the multiple 
instances using any DLP algorithm. 



1 Introduction 

The security of many public-key cryptographic systems is based on the discrete 
logarithm problem (DLP). Examples are the Diffie-Hellman key agreement pro- 
tocol and the ElGamal encryption and signature schemes. 

The DLP can be defined as follows: Let g he a, generator of a finite cyclic 
group G = {g) of order N. For the general DLP, we have to find an integer 
a; (0 < a; < N) such that g^ = h, where h is chosen uniformly at random 
from G (written h Gn G). The integer x is called the discrete logarithm of h to 
the base g, denoted log^ h. If N is composite, one can compute x mod in the 
subgroup of order p^ for each prime power p* dividing N . Then, one can compute 
X by application of the Chinese Remainder Theorem. Further, calculating the 
discrete logarithm in the subgroup of order p^ can be reduced to finding the 
discrete logarithm in the group of prime order p (see [7]). For these reasons, we 
only consider the DLP in groups of prime order N . 

Shoup [10] gave a lower bound for the running time for computing discrete 
logarithms by generic algorithms (probabilistic or deterministic) in groups of 
prime order. The time needed to solve the DLP with a non-negligible probabil- 
ity is c^/N group operations for some constant c. The best algorithm known for 
solving the general DLP is Pollard’s rho algorithm [8]. It does not only match 
Shoup’s lower bound, but also needs very little memory and is parallelizable 
with a linear speed-up (see [6]). For many groups of cryptographic interest, such 
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as the multiplicative group of a finite field (see [1]), and the Jacobians of hyper- 
elliptic curves of high genus (see [2]), there are subexponential-time algorithms 
known for the DLP that are more efficient than Pollard’s rho algorithm. How- 
ever, Pollard’s rho algorithm is the best algorithm known for solving the DLP 
in some groups such as the group of points on an elliptic curve, and the Jaco- 
bian of genus 2 and 3 hyperelliptic curves. Thus, the results in this paper are 
particularly relevant to the DLP in elliptic curve groups and in genus 2 and 3 
hyperelliptic curves. 

This paper extends the analysis of Pollard’s rho algorithm for solving a single 
instance of the discrete logarithm problem in a finite cyclic group G to the case 
of solving more than one instance of the discrete logarithm problem in the same 
group G. Pollard’s rho algorithm is reviewed in §2. In §3, we provide a runtime 
analysis in an idealized model and do an exact analysis of possible time-memory 
trade-offs for the parallelized version. When using Pollard’s rho algorithm to 
iteratively solve all n instances of the DLP in the same group, the data that is 
gathered during the calculation of a single discrete logarithms can be used to 
compute subsequent discrete logarithms. Thus, the additional time needed for 
every new DLP may be smaller than the time needed to solve the one before. A 
careful analysis for this case is provided in §4. In §5 we consider the case where 
the goal is to solve any one of a set of n DLPs in the same group using any DLP 
algorithm. 



2 Pollard’s Rho Algorithm 

2.1 Basic Idea 

Pollard’s rho algorithm is based on the birthday paradox. If we randomly choose 
elements (with replacement) from a set of N numbered elements, we only need 
to choose about Vn elements until we get one element twice (called a colli- 
sion). This can be applied to find discrete logarithms as follows. By choosing 
a,b Gr [0,iV — 1], one obtains a random group element Such group ele- 
ments are randomly selected until we get a group element twice. If and 

g^-ih^i represent the same group element then at biX = aj bjX (mod N), 
whence 



X = {uj — ai){bi — bj) ^ mod IV iovbi^bj (mod iV). (1) 



Let T be the random variable describing the number of group elements chosen 
until the first collision occurs. We denote the probability that T > k hy pk- We 
have 



Pfe = 1 







k-l\^ 

2N j ~ ® 



(2) 



For k G 0{'/N), the relative error of the above approximation is 0{N 
As shown in Appendix B, the expected value of T is E(T) « j2. The first 




214 



Fabian Kuhn and Rene Struik 



collision can be found by simply storing all the randomly selected group elements 
until a repeat is detected. However, this simple-minded method has an expected 
storage requirement of j 2 group elements. 

2.2 The Single Processor Case 

The question now is how to detect a collision without having to store \pKWj2 
group elements. In Pollard’s rho algorithm, this is done by means of a random 
function^ f : G ^ G. For actual implementations, / is chosen such that it 
approximates a random function as closely as possible. Further, it should be 
calculated with a single group multiplication and map an element g°“h^ to an 
element g‘^h‘^ so that c and d can easily be computed from a and b. The originally 
suggested function by Pollard (for Z*) can be generalized towards arbitrary cyclic 
groups as 

{ hx if X G Si; 

x"^ if X £ S 2 ; 

gx if X G Ss- 

Here, Si, S2 and S3 are three sets of roughly the same size which form a partition 
of G. In [12,13], Teske shows that this function is not random enough and gives 
a better function: 

f{x) = X ■ if cc e Ms for s G {I, . . . ,r} and r « 20. 

Here again, the Ms are of roughly the same size and form a partition of G. But 
this time, G is partitioned into more than three subsets. For both functions, it is 
of course necessary that determining the subset Mi, resp. Si, to which a group 
element belongs is very efficient. 

By starting at a random point g‘^°h^° and iteratively applying a random 
function, random points g°’'h^* are generated. Because the group is finite, we 
eventually arrive at a point for the second time. The sequence of subsequent 
points then cycle forever. From §2.1 we know that the first repeat happens after 
an expected E(T) « y^TrN/2 function applications. With very little time and 
space overhead, it is possible to detect such a cycle with Floyd’s cycle-finding 
algorithm (or with an improved variant by Brent [3]). 

2.3 Parallelization of Pollard’s Rho Algorithm 

Unfortunately, iteratively applying a function is an inherently serial process and 
cannot efficiently be parallelized. If m processors run the Pollard-rho algorithm 
as described above, the speed-up when compared to the single processor case is 
only about ySn. For, if the processors run the algorithm individually, the prob- 
ability that none of them has found a collision after k steps is p™ « m/(2N) ^ 

This leads to an expected time of {i/ttM/ 2) j y/m for finding the first collision. 

^ A random function is a function that is chosen uniformly at random from the set of 
all functions f : G ^ G. 
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If, however, the processors communicate with each other, we can do better. If 
we could detect and use any collision which occurs between two processors, the 
speed-up to the single processor case would be a factor m because m processors 
calculate m times as many points as a single processor does. 

In [6], Wiener and van Oorschot presented a very elegant way of parallelizing 
Pollard’s rho alorithm which is based on distinguished points. A distinguished 
point is a group element with an easy testable property. An often used dis- 
tinguishing property is whether a point’s binary representation has a certain 
number of leading zeros. Each processor starts the iteration at a different ran- 
dom element (but all have the same iteration function) . As soon as the iteration 
hits a distinguished point, this point will be sent to a central server and the 
processor starts a new iteration. The server stores all collected points (a^, bi and 
Vi = in a hash table. As soon as the server has received the same point 

twice, it has two representations g°‘'h}’' and for a group element and can 

calculate the discrete logarithm x of h as given in (1). 

As soon as a point occurs in two iterations, the remainder of those two 
iteration trails will be the same and thus lead to the same distinguished point. 
Therefore, by performing the iterations, all processors calculate random group 
elements of the form g°‘h^ and as soon as the same element has been calculated 
twice, we are going to get the same distinguished point twice, as well. If the 
two representations of the point, where the trails collided, are different, the 
representations of this distinguished point are different too, and we are therefore 
able to calculate x. 

3 Analysis of the Parallelized Pollard’s Rho Algorithm 

For our analysis, we make the following assumptions (cf. §3.3): 

1. The iterative function really behaves like a random mapping and thus gen- 
erates uniformly distributed random group elements. 

2. All collisions are useful, i.e. the collision reveals two representations g°‘'h^* 

and of a group element with bi ^ bj (mod N) . 

3. All trails lead to distinguished points (i.e., we neglect the existence of it- 
eration paths which eventually run into a cycle that does not contain a 
distinguished point). 

We denote the number of processors by m. The proportion of the points that con- 
stitute distinguished points is called 9 (i.e., there are 9N distinguished points). 
Additionally, for the analysis, we assume that all processors operate at the same 
speed. 

3.1 Running Time 

The runtime of the Pollard-rho algorithm can be divided into two statistically 
independent phases. First, all processors have to calculate points until a collision 
occurs. We already know that an expected y^TrN/2 points must be calculated 
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for this part of the algorithm. Because all m processors calculate their points 
independently, the expected time for this part is y/ ttN / 2/m function iterations. 

After a collision, the iteration has to be continued until it arrives at a distin- 
guished point. Because for each function application, the probability to come to 
a distinguished point is 0, the number of steps from a collision to its detection 
is a geometrically distributed random variable with expected value 1/0. 

The iteration function is such that the time for one function application is 
equal to the time of one group operation plus a negligible overhead. Thus, the 
overall expected value for the running time of the parallel Pollard-rho algorithm 
is E(T) = {^/TrNj2)/m+ 1/0 group operations. 



3.2 Memory Requirements 

Essentially, the only memory needed for the parallel version of Pollard’s rho 
algorithm is that for storing the distinguished points on the server^. For ev- 
ery iteration, the server has to store one distinguished point. The length of a 
trail portion between distinguished points is geometrically distributed with pa- 
rameter 0. Therefore, the expected length of such an iteration trail is |. This 
means that for the whole duration of the algorithm, all processors will send a 
distinguished point to the server every | group operations on average. There- 
fore, because the time until a collision occurs and the average length of the trails 
are assumed to be statistically independent, the expected space needed on the 
server is E(S') = m0E(T) = 0^/ttNJ 2 + m distinguished points. Note that for 
each distinguished point, we have to store the group element g'^h^ and the inte- 
gers a and b. Therefore, the actual space needed to store one distinguished point 
is O(logfV) bits. 

For the memory requirements to be as small as possible, we have to choose 0 
as small as possible. But of course, if 0 gets smaller, the time overhead | to detect 
a collision gets bigger. In order to keep the overall running time in 0{\/N /m), 
we have to choose 0 in 0{m/ \/~N). Therefore, we choose 0 as 0 = am / {\J t/WJ2) . 
The expected values for time and space then become E(T) = (1 -I- /2/m 

and E(S') = m{l + a). We see that there is a time-space trade-off. But even 
if we choose the constant a quite big, the space requirements are still small. 
Therefore, the limiting factor for solving discrete logarithms with the parallel 
rho algorithm is definitely time. 

Remark 1. We have assumed that all distinguished points are collected by a 
single server. However, it is possible to parallelize the server side with no com- 
munication overhead. Assume that k servers collect the distinguished points. 
One could split up the distinguished point set T> into k disjoint subsets T>i of 
roughly the same size. Server i would then only collect the points of T>i. When 
a client gets a distinguished point, it would have to check to which subset T>i it 
belongs and send it to the appropriate server. Checking if a new distinguished 

^ All clients also need to store a description of the iteration function. This, however, 
requires only O(logA) bits per client. 
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point has already been computed previously can be done independently on each 
server. 

3.3 Assumptions of the Analysis 

At the beginning of §3, we made three assumptions on which we based our time 
and space analysis. We will now elaborate on how realistic these assumptions 
are in an actual implementation. 

Randomness of the function: For our analysis, we assumed that the iteration 
function is perfectly random and therefore produces uniformly distributed group 
elements. In [12,13], Teske shows that the function suggested by her behaves 
practically like a truly random function if the group elements are partitioned 
into about 20 subsets. 

All collisions are useful: A collision reveals two representations of the form 
and of the same group element. If hi ^ hj (mod N), the collision 

can be used to calculate x. Because the hi are random elements of Z^r, the 
probability for this is 1 — Therefore, the probability that a collision is not 
useful is ^ and thus negligible. 

Each iteration reaches a distinguished point: In [9], Schulte-Geers shows 
that the distinguished point set must be at least of size C'/N while c should not 
be too small. This is intuitively clear, since the only way for an iteration not 
to arrive at a distinguished point is to end up in a cycle without distinguished 
points, the expected length of which is a/ttA/S. The condition is certainly met 
by our distinguished point set (c is in our case). Schulte-Geers also 

finds that if we choose Q as described in §3.2, the proportion of starting points 
with iterations that end up in distinguished points is 1/(1 -I- ) > where 

A/”(0, 1) is a standard normally distributed random variable. 

Further, Schulte-Geers shows that if 0 ^ 1/VN, only a negligible number 
of starting points will miss the distinguished point set. We could meet this re- 
quirement by setting a to O(logA). The space requirements still remain very 
small. 

Additionally, van Oorschot and Wiener [6] suggest to abandon all trail por- 
tions without a single distinguished point that are longer than k/9, k times their 
expected lengths. The proportion of time wasted through abandoned trails can 
be estimated^ as k{l — « ke~^ which is very small. 

3.4 Statistical Analysis 

Until now, we have only considered expected values for time and space. We will 
now have a look at the probability distributions of these. 

® Here, we assume the length of the trail portions between subsequent distinguished 
points to be geometrically distributed. Note that this model is slightly inaccurate 
since it implies that all such trail portions eventually lead to a distinguished point. 
For reasonably chosen values of 0, the model will do, however, since the probability 
of ending up in cycles without distinguished points is, indeed, very small. 
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As already explained, the time for finding a discrete logarithm with parallel 
Pollard-rho can be divided in two phases, the time until a collision occurs and 
the time needed for its detection. We will first treat those phases individually. As 
seen in §2.1, the probability that more than I points are needed for a collision is 
Pi « Because in time k, mk points are calculated, the probability that 

„ _ , (mfc)2 

the time T\ for the first phase is longer than k is Pr{Ti > fcj = pmk ~ e 2 w . 
Because the time T 2 for the second phase of detecting a collision is geometrically 
distributed, the probability that T 2 > k is Pr{P 2 > k} = (1 — 0)*. Therefore, the 
probabilities that Ti, resp. T 2 are bigger than (3 times their expected values is: 

Pr{Ti > {(3^/nN/2)lm} « and Pr{T 2 > /3/0} = (1 - 0)^/® « e"^. 

( 3 ) 

For the probability for T 2 , given in (3), note that 0 is very small and that 
lima;4o(l - = e“^. 

We want to avoid having to calculate exact probabilities for the overall time 
T = Ti + T 2 . Therefore, we assume that a in §3.2 is chosen sufficiently large to 
achieve a good running time. In this case, T\ dominates the time T and we can 
approximate the probability that T > /3E(T) with Pr{Ti > /3E(Ti)}. Taking 
Equation (3), we then get: 

Pr{T > ^E(T)} « (4) 

Table 1 gives samples of the probabilities for various values of /?. 



Table 1. Probabilities for the running time of Pollard’s rho algorithm. 



p 


1/100 


1/10 


1/3 


1/2 


1 


3/2 


2 


3 


Pr{T > dE(T)} 


1.000 


0.992 


0.916 


0.822 


0.456 


0.171 


0.043 


0.001 



For space, exactly the same analysis holds. In fact, the space needed is very 
close to mOT where T is the actual running time. This is because the length 
of every iteration trail is geometrically distributed with parameter 0 and the 
lengths of different trails are statistically independent. By application of the 
limit theorem, we get that the average length of the trails is very close to the 
expected length. 

4 Solving Multiple Instances of the DLP 

In this section, we consider the situation where one wants to solve multiple, say 
L, discrete logarithms in the same group (using the same generator). Hence, 
we have a set of L group elements hi = (where 1 < i < L) and we would 
like to find all exponents Xj. This can be done by solving each of the discrete 
logarithms individually, using the rho algorithm. A better approach, however. 
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is to take advantage of the distinguished points gathered during the solution of 
the first k discrete logarithm problems using the rho algorithm, to speed up the 
solution of the (fc+1)®* discrete logarithm. As soon as we find a discrete logarithm 
Xi = logg hi, we have a representation of the form for all distinguished points 

that were calculated in order to find Xi. The value of c is c = (a^ + 
Xibj) mod N. If we now find a collision between a distinguished point g‘^ and 
a new one of the form g°‘h\., we can calculate Xk aiS Xk = {c — a)b~^ mod N. 
This method was also suggested by Silverman and Stapleton [11], although a 
precise analysis has not been published. It seems obvious that the number of 
operations required for solving each new logarithm will become smaller, if one 
takes advantage of information gathered during previous computations. In this 
section, we will provide an exact analysis for this case. 

The number of points we have to calculate with the rho algorithm to find 
L discrete logarithms is equal to the number of points we have to choose with 
replacement out of a set with N numbers until we have chosen L numbers at 
least twice (i.e. there are L collisions). We denote the expected value for the 
number of draws W to find L collisions by E(IT) = E^. 

Theorem 1. We have El « \/t:N/ 2 for L <C -^N. 



Proof: Suppose that an urn has N differently numbered balls. We consider an 
experiment where one uniformly draws n balls from this urn one at a time, with 
replacement, and lists the numbers. It is clear that if one obtains k < n different 
numbers after n draws, then n — k balls must have been drawn more than once 
(counting multiplicity), i.e., n — k ‘collisions’ must have occurred. We will be 
mainly interested in the probability distribution of the number of collisions as a 
function of the number of draws. 

Let Qn,k denote the probability that one obtains exactly k differently num- 
bered outcomes after n draws. For any fixed fc-set, the number of ways to choose 
precisely k differently numbered balls in n draws equals a(n,k), the number of 
surjections from an n-set to a fc-set. Hence, the number of possibilities to choose 
exactly k different balls in n draws equals (^)a(n, fc) and, therefore. 



Qn,k — 




a{n,k)/N'^ 



a(n, k) N{N — 1) ■ ■ ■ {N — k + 1) 
k\N^ N -N-'-N 



S{n, k) 

]^n-k 



where Pfc = (1 — 1/A^)(1 — 2/N) • • • (1 — (fc — 1)/N) is the probability of drawing 
k differently numbered balls in k draws, and where S{n,k) := a{n,k)/k\ is a 
Stirling number of the second type (cf. Appendix A). 

We now compute the expected number El of draws until one obtains precisely 
L collisions. Let Q^n-L denote the probability that one requires more than n 
draws in order to obtain L collisions. Hence 

L-l 

Qn,n-L ~ gn,n-t- 

t=0 



( 5 ) 
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Now, the probability that one needs exactly n draws in order to obtain L col- 
lisions is given by Q^_i n-i-L ~ Qnn-L- ^ result, the expected number of 
draws that one needs in order to obtain L collisions is given by 

OO OO 

EL=J2^(Qn-l,n-l-L-Qtn-L) = (L-l)+ ^ 

n—L n—L—1 

From equation (5) we infer that = Qn,n-L + <ln,n-L, hence one 

obtains 



El+1 — El 



^ ^ Qn,n-{L+1) ^n,n-l 



1,—L 



1—L— 1 



^ ^S{k + L,k) 

/ . Qn,n-L — 2_^ ivFc P^' 



'i—L 






NL 



(6) 

( 7 ) 



Obviously, one has Eq = 0, hence we can compute El via 

S{k + 1, k) 



L-l 



^^ = EE 



t=0 fc=0 






-Pk- 



(8) 



We will now approximate El based upon an approximation for Et+i — Et (for 
t < L). It will turn out that the relative error of our approximation is negligible 
if L < cn\^ (here 0 < cat < 1 is a small constant). We will use the fact that 
for any fixed value of L, the Stirling number S(k + L, k) is a polynomial in k of 
degree 2L. More specifically, one has (cf. Lemma 1 of Appendix A) that 



2L 



S{k + L,k) = ^J2 Pj{L)k^^ where Lpj{L)&Q[x\ has degree at most 2j. 



i=o 



A substitution in Equation (6) now yields 



2L 



El+1 — El = 



1 ^ 

2^L! ^ ^ \\/N 

j=o V Jv ^ V iv 



2L-j 



Pk- 



(9) 



We will now approximate this expression, using approximations for pk and the 
function ipj(L). The inner summation can be approximated, using the approxi- 
mation pl ~ e~^ and a Riemann integral. We have 



fc=0 



E( — 



2L-j 

Pk 



k—0 



E( — 



2L-j 






x—0 
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where It is the value of the integral determined in Lemma 2d Substitution of 
this approximation in Equation (9) now yields 



El+i 



El^'/N 






= vdv 



1 



2L 









'^1 

(2L\ (2L\ 

= vW2^(l + 0(1)) « vW2^ 



hh-j 



The latter approximation follows from the fact that ^Pj{L) = 1 and that for 
j > 0, ^Pj{L) is a polynomial in L of degree at most 2j without a constant term 
and, hence, (pj{L)/{\/Ny « 0 if L <C \d/V. Substituting this approximation in 
Equation (8), we now find that 



L-l 



/2t\ L-l I2t\ 



t=0 



t=0 



= y'TTN/2{2L - 1 ) 



/2L-2\ 
V L-l ) 
4L-I 



{2|y/I^)VL^/^^N|2 = V2LN. 



□ 

Remark 2. The above result gives a highly accurate estimate of the expected 
time required to solve multiple instances of the discrete logarithm problem in 
the same underlying group. Unfortunately, this does not give direct insight in 
the probability distribution hereof. We should mention, however, that the same 
techniques used above to estimate expected values can also be used to estimate 
the vaiance of the probability distribution. It turns out that the variance, when 
compared to the expected time, is relatively low, especially if the number L of 
discrete logarithm problems one considers is not too small. Thus, the expected 
value of the running time of Theorem 1 is a good approximation of practically 
observed values (for L not too small). Full details will be provided in the full 
paper, space permitting. 

We can conclude from Theorem 1 that computing discrete logarithms itera- 
tively, rather than independently, is advantageous, since the workload involved 
in computing the t + 1®^ discrete logarithm, once the first t of these have been 
solved, now becomes only « l/-\/7rt times as much as the workload 

^■kN j2 required for computing a single discrete logarithm. Thus, we arrived 
at a total workload for computing L discrete logarithms iteratively of approxi- 
mately y/2NL group operations, which is {21^/tt) ■\/L « 1.128-\/Z times as much 

^ It turns out that the relative error of this approximation is 0(() log{N)/-\/N). For 
details, cf. Lemma 4 and its subsequent remark. 
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as the workload for computing a single discrete logarithm. Thus, economies of 
scale apply: computing L discrete logarithms iteratively comes at an average 
cost per discrete logarithm of roughly ^J2N / L group operations, rather than 
of approximately ttN/2 group operations (as is the case when computing dis- 
crete logarithms independently). Our results hold for 0 < L < where 

0 < Cat < 1 is some small constant. 

Our extension of Pollard’s rho algorithm is a generic algorithm for solving 
multiple instances of the discrete logarithm problem in finite cyclic groups. The 
low average workload ® we obtained for computing multiple discrete logarithms 
seems to be counter-intuitive, since it seems to contradict Shoup’s result [10], 
which gives a lower bound of f2{\/ N) group operations required by generic al- 
gorithms solving the discrete logarithm problem in groups of prime order N. 
The result is explained by observing that the low average workload is due to the 
fact that solving subsequent discrete logarithm problems requires relatively few 
operations, once the first few discrete logarithms have been computed. Thus, the 
bottleneck remains the computation of, e.g., the first discrete logarithm, which 
in our case requires roughly y^wNj2 = f2{^/N) group operations. It should be 
noted, that Shoup’s result does not apply directly, since he addresses the sce- 
nario of a single instance of the discrete logarithm problem, rather than that of 
multiple instances hereof, which we address. Thus, one cannot a priori rule out 
the existence of other generic algorithms that, given L instances of the discrete 
logarithm problem in a group of prime order N, solve an arbitrary one of these 
using only 0{^/N/L) group operations. 

5 On the Complexity of DLP-like Problems 

In the previous sections, we discussed the workload required for solving multiple 
instances of the discrete logarithm problem with respect to a fixed generator 
of a finite cyclic group of order N, using extensions of Pollard’s rho algorithm. 
We found that computing discrete logarithms iteratively, rather than indepen- 
dently, is advantageous. In §5.1 we will consider the problem of solving 1 out of 
n instances of the discrete logarithm problem and several other relaxations of 
the classical discrete logarithm problem (DTP) and consider the computational 
complexity hereof. It turns out that these problems are all computationally as 
hard as DTP. In particular, it follows that generic algorithms for solving each 
of these relaxations of the discrete logarithm problem in a prime order group 
require f2{y/N) group operations. In §5.2 we consider the generalization of the 
classical discrete logarithm problem of solving k instances hereof (coined /cDLP). 
Again, we consider several relaxations of this so-called /cDLP and discuss their 
computational complexity. It turns out that, similar to the case k = 1, these 
problems are all computationally as hard as solving fcDLP. We end the section 
with a conjectured lower bound f2{\/kN) on the complexity of generic algo- 

® The average workload per discrete logarithm is 0(A®f®) group operations if one 
solves L ~ cn'VN discrete logarithm problems iteratively. 
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rithms for solving fcDLP, which - if true - would generalize Shoup’s result for 
DLP towards /cDLP. 



5.1 Complexity of Solving 1 of Multiple Instances of the DLP 

We consider the following variations of the discrete logarithm problem: 

1. (DLP-1) Solving a single instance of the discrete logarithm problem: 

System: Cyclic group G; generator g for G. 

Input: Group element h Gr G. 

Output: Integer x such that h = . 

2. (DLP-2) Solving a single instance of the discrete logarithm problem (selected 
arbitrarily from a set of n instances of the discrete logarithm problem): 

System: Cyclic group G; generator g for G. 

Input: Group elements hi, . . . ,hn Gr G. 

Output: Pair (j,Xj) such that hj = g^^ and such that 1 < j < n. 

3. (DLP-3) Finding a discrete logarithm with respect to an arbitrary basis 
element (selected from a set of m basis elements): 

System: Cyclic group G; arbitrary generators gi, . . . ,gm for G. 

Input: Group element h Gr G. 

Output: Pair (i, x) such that h = gf and such that 1 < i < m. 

4. (DLP-4) Finding a linear equation in terms of the discrete logarithms of all 
group elements of a set of n instances of the discrete logarithm problem: 

System: Cyclic group G; generator g for G. 

Input: Group elements h\, . . . ,hn Gr G. 

Output: A linear equation ~ ^ (with known values of 

Oi, . . . , a„ and b). 

5. (DLP-5) Finding the differences of two discrete logarithms (selected arbi- 
trarily from of a set of n instances of the discrete logarithm problem) : 

System: Cyclic group G; generator g for G. 

Input: Group elements h\, . . . ,hn Gr G. 

Output: Triple {i,j, logg hi — logg hj), where 0 < i ^ j < n and where 

ho ■■= g. 

The following theorem relates the expected workloads required by optimal 
algorithms for solving the discrete logarithm problem (DLP-1) and for solving 
arbitrarily 1 out n instances of the discrete logarithm problem (DLP-2). 
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Theorem 2. LetTuhP, resp. TDLp(i:n), &e the expected workload of an optimal 
algorithm for solving the discrete logarithm problem, resp. for arbitrary solving 1 
out of n instances of the discrete logarithm problem. Then, one has < 

Tdlp < 7i)LP(i:n) group operations). 

Proof: The inequality Tolp < 2pLP(i:n) + n follows from a reduction of an 
instance of the DLP to an instance of the DLP(l:n). The other inequality follows 
from the observation that DLP=DLP(1:1). Let h := g^ be a problem instance 
of DLP. We will reduce this to a problem instance hi, . . . ,hn of DLP(l:n) as 
follows: for all i, 1 < t < n, select the numbers uniformly at random from the 
set {0, . . . ,N — 1} and define hi := g'^'h = Note that all hi are random, 

since all x + ri are random and independent. Now apply an oracle that solves 
DLP(l:n), to produce an output (j,Xj), with Xj := log^ hj and with 1 < j < n. 
Since hj = g'~^h and since rj is known, we get the required discrete logarithm x 
as X = Xj — rj(mod N). □ 

Corollary 1. The problem of solving arbitrarily 1 out of n instances of the 
discrete logarithm is computationally as hard as solving the discrete logarithm 
problem, provided n <C Tolp- Moreover, any generic algorithm that solves this 
problem in a group of prime order N requires at least ilfs/N) group operations. 

Proof: The bound Tp)pp(i.„) = n{TuLp) follows from Theorem 2 and the in- 
equality n <C TpLp. The lower bound on the required workload for a generic al- 
gorithm that solves® the relaxed discrete logarithm problem DLP(l:n) in groups 
of prime order N follows from the corresponding result for the discrete logarithm 
problem [10]. □ 

Remark 3. In fact, one can show that Theorem 2 and Corollary 1 easily gener- 
alize to each of the problems DLP-1, ..., DLP-5 above. In particular, one has 
that each of the problems DLP-1, ..., DLP-5 is as hard as solving the discrete 
logarithm problem, provided n,m <C Topp. Moreover, any generic algorithm 
that solves any of the the problems DLP-1, ..., DLP-5 in a group of prime order 
N requires at least Q{\/N) group operations. 



Remark ). One can show that each of the problems DLP-1, ..., DLP-5 can be 
solved directly using Pollard’s rho algorithm, with starting points of the ran- 
domized walks that are tailored to the specific problem at hand. In each case, 
the resulting workload is roughly y^TrN/2 group operations. 

5.2 Complexity of Solving Multiple Instances of the DLP 

In the previous section, we related the workload involved in solving various 
relaxations of the classical discrete logarithm problem. The main result was that 
the problem of solving arbitrarily 1 out of n instances of the discrete logarithm 
is computationally as hard as solving a single instance of the discrete logarithm 

® with a probability bounded away from zero 
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problem. In this section, we consider the similar problem where we are faced 
with solving k given instances of the discrete logarithm problem. 

We consider the following variations of the discrete logarithm problem /cDLP: 

1. (fcDLP-1) Solving k instances of the discrete logarithm problem: 

System: Cyclic group G; generator g for G. 

Input: Group elements hi, . . . ,hk Gr G. 

Output: k pairs {i,Xi) such that hi = 

2. (fcDLP-2) Solving k instances of the discrete logarithm problem (selected 
arbitrarily from a set of n instances of the discrete logarithm problem) : 

System: Cyclic group G; generator g for G. 

Input: Group elements h\, . . . ,hn Gr G. 

Output: k pairs (j,Xj) such that hj = g^G where j G J and where J is 

a /c-subset of n}. 

3. (fcDLP-3) Finding k discrete logarithms with respect to k arbitrary basis 
elements (selected from a set of m basis elements): 

System: Cyclic group G; arbitrary generators gi, . . . ,g^ for G. 

Input: Group element h Gr G. 

Output: k pairs (i, Xi) such that h = g^\ where i G I and where / is a 

fc-subset of m}. 

4. (fcDLP-4) Finding k linear equations in terms of the discrete logarithms of 
all group elements of a set of n instances of the discrete logarithm problem: 

System: Cyclic group G; generator g for G. 

Input: Group elements h\, . . . ,hn Gr G. 

Output: A set of k linear equations '^ij ^ogg hj = bi (with known 

values of a^- and bi). 

5. (fcDLP-5) Finding k differences of two discrete logarithms (selected arbitrar- 
ily from of a set of n instances of the discrete logarithm problem): 

System: Cyclic group G; generator g for G. 

Input: Group elements h\, . . . ,hn Gr G. 

Output: A set of k triples {i,j, logg hi — logg hj), where 0 < i yf j < n 

and where /iq ■= 9- 

One can show that the results of the previous subsection carry over to this 
section, as follows: 

— Each of the problems /cDLP-l, ..., /cDLP-5 is as hard as solving k instances 
of the discrete logarithm problem, provided kn, km, <C Trrr. 

— Any generic algorithm for solving k instances of the discrete logarithm prob- 
lem in a group of prime order N require at least Q{-\fN) group operations. 
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— Each of the problems fcDLP-1, fcDLP-5 can be solved directly using the 
extension of Pollard’s rho algorithm presented in §4, with starting points of 
the randomized walks that are tailored to the specific problem at hand. In 
each case, the resulting workload is roughly \/2Nk group operations. 

The proofs use constructions based on maximum distance separable codes (cf., 
e.g., [5]). Details will be included in the full paper. 

The lower bound on the required workload for a generic algorithm that solves 
k instances of the discrete logarithm problem is not very impressive: it derives 
directly from Shoup’s lower bound l7(-\/iV) for solving a single discrete logarithm 
(i.e., k = 1) .It would be of interest to find a stronger lower bound in this case. 
Based on the workload involved in computing k discrete logarithm problems it- 
eratively, we postulate that the ‘true’ lower bound is Q{\/kN). We suggest this 
as an open problem. 

Research Problem. Show that any generic algorithm that solves, with a prob- 
ability bounded away from zero, k instances of the discrete logarithm problem 
in groups of prime order N requires at least Q{\/kN) group operations. 
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A Stirling Numbers 

In this section, we introduce Stirling numbers. These numbers play an important 
role in several parts of combinatorics. We mention several properties of these 
numbers that will be used in the paper. For a detailed discussion, we refer to [4]. 

In the sequel, n and k denote non-negative integers. Let a{n, k) denote the 
number of surjections from an n-set to a /c-set. Obviously, one has a{n, fc) = 0 
if fc > n and a{n, k) = n\ \i n = k. In general, one can use the principle of 
inclusion-exclusion to show that 

a{n,k) = ^{-iy(^){k-zr. 

i=0 



Let S{n, k) denote the number of ways to partition an n-set into k nonempty 
subsets. The numbers S{n,k) are called Stirling numbers of the second kind. 
Obviously, one has that S(ji,k) = a{n,k)/k\. Moreover, by a simple counting 
argument, one can show that 



S{n,k)= ^ 

laj^ + 2a2 + • • • + nan ~ Ti 
ap + a2 + • • • + o.n = ^ 



(l!)“i (2!)“2 . . . (n!)“'“ai!a2! . . . a«! 



( 10 ) 



Our main interest is in those Stirling numbers S{n, k) for which n — k is relatively 
small. The first few of these are 



S{k,k) = 1; 

S{k -\-l,k) = ^{k -\- l)k = + k); 

S{k + 2,k) = ^{k + 2){k + l)k{k + 1) = |(fc4 + Mfc3 ^ ^ 2;.). 

S{k + 3,k) = y^{k + 3){k + 2){k + l)A:(fc2 -b k) 

= + 17fc4 -b I7fc3 + 6*2); 

S{k + 4, fc) = ^{k + 4)(fc + 3)(fc + 2)(* + l)fc(*3 + 2*2 + ifc _ ^) 

= 3ii(*® -b 12*7 l|6^6 4|3^4 gQ*3 -b |*2 - f *). 
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Computing S{k + L, k) for bigger values of L is quite cumbersome. Fortunately, 
however, one can express S{k + L,k) as a polynomial in k with coefficients in L, 
as is demonstrated by the following lemma. 



2L 

Lemma 1. For all k,L > 0, one has S{k + L,k) = X) Vj{L)k"^^~^ , where 

^Pj{x) € Q[a;] has degree at most 2j (0 < j < 2L). For j > 0, one has x\ipj{x). 
The first few coefficients are 



To{x) 



2o 1 2, 2o„l2 

1, = -X + -X, (p2(x)=-x +-x’^-2— a; 



7 

16 



X. 



Proof: To be provided in the full version of this paper. 



□ 



B Approximations of Combinatorial Expressions 



In this section, we provide approximations of some combinatorial expressions 
that will be used in the paper and indicate the accuracy of these approximations. 

Lemma 2. For all t>0, define It := / Then 

X — {) 

ht = -^24-1 = {t- l)!2*“b 

Proof: The result follows using partial integration and an induction argument. 
For t > 1, one has It = {t — since 

OO OO 

It= f = - f 

x^O x^O 

OO 

= + (t - 1) f ^'^dx = [t - l)It-2- 

x=0 

Moreover, /g = \[t^]2 and C = 1, since 

e~'' ^“^r dr dtp = tt/ 2 and 

/i = - I d(e-"'/2) = 1. 

x—0 




The result now follows using induction. 



□ 
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Lemma 3. Let N > 0, let t & N. Then 



1 

^/N 




It {N 



oo). 



Proof: The result follows from the observation that the expression is a Riemann 
sum that converges to It- □ 



Lemma 4. 



fc-i 

Let N > 0, let t £ N, and let pk ■= 0 (1 ~ V-^)- Then 

i=0 



1 

Vn 




{N 



oo). 



Proof: The result follows from Lemma 3, using the estimate pk ~ 

upper bounding the approximation error in the ‘tail’ of the summation. Details 

will be provided in the full paper. □ 



Remark 5. One can show that convergence is as follows: 

1 

y/N 

Hence, for big values of N (as in our applications) the approximation of the 
expression by It is almost precise. 



E 

k=0 



k 

y/N 



Pfc = (1 + s)It, where |£| G 0(log(iV)/v^). 
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Abstract. For cryptographic applications, normal bases have received 
considerable attention, especially for hardware implementation. In this 
article, we consider fast software algorithms for normal basis multipli- 
cation over the extended binary field GF(2™). We present a vector-level 
algorithm which essentially eliminates the bit-wise inner products needed 
in the conventional approach to the normal basis multiplication. We then 
present another algorithm which significantly reduces the dynamic in- 
struction counts. Both algorithms utilize the full width of the data-path 
of the general purpose processor on which the software is to be exe- 
cuted. We also consider composite fields and present an algorithm which 
can provide further speed-up and an added flexibility toward hardware- 
software co-design of processors for very large finite fields. 

Keywords: Finite field multiplication, normal basis, software algo- 
rithms, ECDSA, composite fields. 



1 Introduction 

The extended binary finite field GF(2’”) of degree m is used in important cryp- 
tographic operations, such as, key exchange, signing and verification. For today’s 
security applications the minimum values of m are considered to be 160 in the 
elliptic curve cryptography and 1024 in the standard discrete log based cryptog- 
raphy. Elliptic curve crypto-systems use relatively smaller field sizes, but require 
considerable amount of field arithmetic for each group operation (i.e., addition of 
two points). In such crypto-systems, often the most complicated and expensive 
module is the finite field arithmetic unit. As a result, it is important to de- 
velop suitable finite field arithmetic algorithms and architectures that can meet 
the constraints of various implementation technologies, such as, hardware and 
software. 



S. Vaudenay and A. Youssef (Eds.): SAC 2001, LNCS 2259, pp. 230-244, 2001. 
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For cryptographic applications, the most frequently used GF(2™) arithmetic 
operations are addition and multiplication. Compared to the former, the latter 
is much more complicated and time consuming operation. The complexity of 
GF(2™) multiplication depends very much on how the field elements are rep- 
resented. For hardware implementation of a multiplier, the use of normal bases 
has received considerable attention and a number of hardware architectures and 
implementations have been reported (see for example [1], [2], [7], [20]). A ma- 
jority of such efforts were motivated by the fact that certain normal bases, e.g., 
optimal bases, yield area efficient multipliers, and that the field squaring, which 
is heavily used in exponentiation and Frobenius mapping, is a simple cycle shift 
of the field element’s coordinates and hence in hardware it is almost free of cost. 
However, the task of implementing a normal basis multiplier in hardware poses a 
number of challenges. For example, when one has to deal with very large fields, 
the interconnections among the various parts of the multiplier could be quite 
irregular which may slow down the clock speed. Also, normal basis multipliers 
are not easily scalable with m. Given a normal basis multiplier designed for 
Gf( 2233), one cannot conveniently make it usable for GF(2^®^) or GF(2^®^). 

Unlike hardware, so far software implementation of a GF(2’”) multiplier us- 
ing normal bases has not been very efficient. This is mainly due to a number 
of practical considerations. Most importantly, normal basis multiplication algo- 
rithms require inner products or matrix multiplications over the ground field 
GF(2). Such computations are not directly supported by most of today’s gen- 
eral purpose processors. These computations require bit-by-bit logical AND and 
XOR operations, which are not efficiently implemented using the instruction set 
supported by the processors. Also, when a high level programming language, 
such as, G is used, the cyclic shifts needed for field squaring operations, are not 
as efficient as they are in hardware. 

In this article, we consider algorithms for fast software normal basis multipli- 
cation on general purpose processors. We discuss how the conventional bit-level 
algorithm for normal basis multiplication fails to utilize the full data-path of the 
processor and makes its software implementation inefficient. We then present a 
vector-level normal basis multiplication algorithm which eliminates the matrix 
multiplication over GF(2) and significantly reduces the number of dynamic in- 
structions. We then derive another scheme for normal basis multiplication to 
further improve the speed. We also consider normal basis multiplication over 
certain special classes of composite fields. We show that normal basis multipli- 
ers over such composite fields can provide an additional speed-up and a great 
deal of flexibility toward hardware-software co-design of very large finite field 
processors. 

2 Preliminaries 

2.1 Normal Basis Representation 

It is well known that there exists a normal basis (NB) in the field GF(2™) over 
GF{2) for all positive integers m. By finding an element PgGF{2"^) such that 
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{P, • • • , is a basis of GF(2™) over GF{2), any element A G GF(2’") 

can be represented as A = = aoP + ai/3^ H h Um-iP'^"' , where 

Oj G GF{2), 0 < i < m — 1, is the i-th coordinate of A. In this article, this normal 
basis representation of A will be written in short as A = (qq, Oi, • • • , am-i)- In 
vector notation, element A will be written as A = a • = P ■ aP' , where a = 

[oo, oi, • • • , Om-i], P = [P, /3^, ■ ■ ■ , P"^ ], and T denotes vector transposition. 

Now, consider the following matrix 

M = /3^-/3= [/32'+2"1’" \ (1) 

L Ji,i=o 

whose entries belong to GF(2™). Writing these entries with respect to the NB, 
one obtains the following. 

M = Mo/3 + Mi/?2 + . . . + (2) 

where M^’s are m x m multiplication matrices whose entries belong to GF(2). 
Let 0 < / < m — 1, be the number of I’s (or Hamming weight) of M^. 

It is easy to verify that iJ(Mo) = iL(Mi) = • • • = iL(Mm-i)- The number of 
logic gates needed for the implementation of a NB multiplier depends on iJ(Mi) 
which is referred to as the complexity of the normal basis. Let us denote this 
complexity as Gn- It was shown in [12] that Gn > 2m — 1. When Gn = 2m — 1, 
the NB is called an optimal normal basis (ONB). 

Two types of ONBs were constructed by Mullin et al. [12]. Gao and Lenstra 
[5] showed that these two types are all the ONBs in GF(2™). As an extension 
of the work on ONBs, Ash et al. in [3] proposed low complexity normal bases of 
type t where t is a positive integer. These low complexity bases are referred to as 
Gaussian Normal Basis (GNB). When t = 1 and 2, the GNBs become the two 
types of ONBs of [3]. A type t GNB for GF(2"*) exists if and only if p = tm + 1 
is prime and gcdpp, m) = 1, where k is the multiplicative order of 2 modulo p 
[8]. More on this can be found in [3]. 

2.2 Conventional NB Multiplication Algorithm 

Below we give the conventional normal basis multiplication algorithm as de- 
scribed by NIST in [13]. This algorithm is for t even only (the reader is referred 
to [8] for algorithm with t odd). The case of t even is of particular interest for 
implementing high speed crypto-systems based on Koblitz curves. Such curves 
with points over GF(2™) exist for m = 163, 233, 283, 409, 571, where normal 
bases have t even. Note that in the following algorithm, p = tm + 1, and A <C i 
(resp. A i) denotes i-fold left (resp. right) cyclic shifts of the coordinates of 
A. The algorithm requires the input sequence F(l), F{2), • • • , F{p — 1) to be 
pre-computed using 

F(2*u-^ mod p) = i, 0 < i < m — 1, 0 < j < t, (3) 

where u is an integer of order t mod p. 
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Algorithm 1 (Bit-Level NB Multiplication) 

Input: A, B G GF{2™), F{n) € [0, m — 1] for 1 < n < p — 1 

Output: C = AB 

1. Initialize C = (cq, ci, • • • , Cm-i) '■= 0 

2. For i = 0tom— 1{ 

3. For n=ltop — 2{ 

4 . Ci := Ci 0,f'(^n+l)^F{p—n) 

5 . } 

6. A < 1, B < 1 

7 . } 

Software implementation of Algorithm 1 is not very efficient for the following 
reasons. First, in each execution of line 4, one coordinate of each of A and 
B are accessed. These accesses are such that their software implementation is 
rather unsystematic and typically requires more than one instruction. Secondly, 
in line 4 the mod 2 multiplication of the coordinates, which is implemented by 
bit level logical AND operation, is performed m{p — 2) times in total, and the 
mod 2 addition, which is implemented by bit level logical XOR operation, is 
performed jm{p — 2) times, on average, assuming that A and B are two random 
inputs. In the C programming language, these mod 2 multiplication and addition 
operations correspond to about m{p—2) AND and ^m{p—2) XOR instructions^, 
respectively. 

3 Vector-Level NB Multiplication 

In this section we discuss improvements to Algorithm 1 so that normal basis 
multiplication can be efficiently implemented in software. One crucial improve- 
ment is that most arithmetic operations are done on vectors instead of bits. 
This enables us to use the full data-path of the processor on which the software 
is executed. The assumption that t is even in Algorithm 1 is also used in the 
remaining discussion of this section. 

Lemma 1. For GNB of type t, where t is even, the sequence F{n) of p — 1 
integers as defined above is mirror symmetric around the center, i.e., F{n) = 
F{p — n), 1 < n < p — 1. 

Proof. In (3), t is the smallest nonzero integer such that u* mod p = 1. Then 
mod p must be equal to — 1 . For 0 < i < m — 1 and 0 < j < t — 1, let 
n = 2*u-' mod p. Then F{n) = F(2^u^ mod p) = i. Also, F(2®u5“*'^ mod p) = i. 
Thus F{n) = F(2'‘ui~^^ mod p) = F(— 2*m-^ mod p) = F{p — n). □ 

From (3) and Lemma 1, one has F'(l) = F{p — 1) = 0. For 1 < n < p — 2, 
let us define 

AF{n) = F{n -f 1) — F{n) mod m. (4) 

Now we have the following corollary. 

^ These are dynamic instructions which the underlying processor needs to execute. 
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Corollary 1. For AF{n) as defined above and for t even, the following holds 

AF{p — n) = m — AF{n — 1) mod m, 1 < n < p — 2. 

Proof. Using (4), one obtains F{n + 1) = ^F{i). Applying Lemma 1 into 

(4), one can also write AF{p — n) = —AF{n — 1) mod m, 2 < n < p — 1 which 
results inAF{p — n) = m — AF{n — 1), 2 < n < and AF{^^) = 0. □ 

In Algorithm 1, the i-th coordinate of the product C = AB is computed in 
its inner loop which can be written as follows 

p-2 

Ci ^ ^ 0 "f: i "A: m 1. (5) 

n— 1 

Using Lemma 1 and equation (4), one can write 

p-2 

Ci = ^ aF^ri+l)+ibF(n)+z, 0 < i < w - 1, (6) 

n— 1 

p-2 

— ^ ^ ^F{n)+AF(n)+ibF{n)+i^ 0 'f, i 'f, TTl 1. (7) 

n— 1 

For a particular GNB, the values of AF{n), 1 < n < p — 2, are fixed and are 
to be determined only once, i.e., at the time of choosing the basis. Additionally, 
Corollary 1 implies that it is sufficient to store only half (i.e., of these 
AF{n)’s. We now state the vector-level algorithm for t even as follows. A similar 
algorithm for odd values of t is given in [18]. 

Algorithm 2 (Vector-Level NB Multiplication) 

Input: A, B G GU(2™), AF{n) € [0, m — 1], 1 < n < p — 1 

Output: C = AB 

1. Initialize Sa ■= A, Sb ■= B, C := 0 

2. For n=ltop — 2{ 

3. Aa < AF{n) 

4. R := Sa Q Sb 

5. C:=C + R 

6. Sb < AF{n) 

7 . } 

In line 4 of Algorithm 2, for X, Y € GF{2"^), X Q Y denotes the bit-wise 
AND operation between coordinates of X and Y, i.e., XQ Y = {xoPo, x\yi, • • • , 
Xm-iy-m-i)- In order to obtain an overall computation time for a GF(2™) multi- 
plication using Algorithm 2, the coordinates of the field elements can be divided 
into \fj\ units where uj corresponds to the data-path width of the processor. 
We assume that the processor can perform bit-wise XOR and AND of two w-bit 
operands using one single XOR instruction and one single AND instruction, re- 
spectively. Since the loop in Algorithm 2, has p— 2 iterations, the total number of 
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bit-wise AND and bit-wise XOR instructions are the same and is (p — 2) = 

(tm — 1) 1"^] . Also, this algorithm needs 2 (p — 2) |"^] =2 {tm — 1) |"^] cyclic 
shifts. We assume that an i-fold, I < i < uj, left/right shift can be emulated in 
the C programming language using a total of p instructions. The value of p is 
typically 4 when simple logical instructions, such as AND, SHIFT, and OR are 
used. We can now state the following theorem. 



Theorem 1. The dynamic instruction count for Algorithm 2 is given by 



ff Instructions ~ 2(1 -|- p) {tm 




4 Efficient NB Multiplication over GF{2^) 

In this section, we develop another algorithm for normal basis multiplication. We 
also analyze the cost of this algorithm in terms of dynamic instruction counts 
and memory requirements and then compare them with those of similar other 
algorithms. 



4.1 Algorithm 

For the normal basis {/3, • • • , ^}, let Sj = J = 1) ’ ’ ’ > v, where 

u = [ . Then one has the following result from [16]. 

Lemma 2. Let A and B he two elements of GF(2’”) and C he their product. 
Then 



C = 



E m— 1 

E m— 1 
2=0 






for m odd 
for m even 



where Ui ’s and bi ’s are the NB coordinates of A and B, respectively. Also, indices 
and exponents are reduced mod m and 



Xi,j = aibi+j + Gi+jhi, l<j<v, 0<i<m — 1. (8) 

Let hj, 1 < j < v, be the number of I’s in the normal basis representation 
of Sj. Let Wj^i, Wj^ 2 , ■ ■ ■ , Wj^hj denote the positions of I’s in the normal basis 
representation of Sj, i.e., 

h jf 

= ( 9 ) 

fc=l 

where 0 < Wjp < Wj ^2 < ■ • ■ < Wj,hj < m — 1. Now, using (9) into Lemma 2, we 
have the following for m odd. 

m—1 m — 1 V / 

i— 0 i—0 j — 1 \ fc— 1 
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i— 0 i—0 j — 1 \ fc— 1 

m—1 V /m — 1 

i=0 J = 1 k=l V i=0 

Also, for even values of to, one has v = ^ <5„ = . This implies 

that in the normal basis representation of 6^, its t-th coordinate is equal to its 
(^ + i mod TO)-th coordinate. Thus, hy is even and one can write 

= v=^. ( 11 ) 

fc=i 

Now, using (11) into Lemma 2 (for to even) and using (10), we have the following 
theorem, where all indices and exponents are reduced modulo to. 

Theorem 2. Let A and B he two elements of GF{2'^) and C he their product. 
Then 




C = 



where 



E™o' + e;.i e:1i es' 



TZ'o + Ep! E"ii (ES^ 



, for m odd 
+ F, for m even 



(12) 









■/3" 



k=l i—0 



), and V = — . 
^ 2 



Note that for a normal basis, the representation of 6j is fixed and so is 
Wj,k, 1 < j < V, 1 < k < hj. Now, define 



Awj^k = Wj^k - Wj,k-i, ^ < j <v, I < k < hj, Wjfi = 0, 



(13) 



where Wj^s are as given in (9). For a particular normal basis, all Wj^s are fixed. 
Hence, all Awj^kS need to be determined only at the time of choosing the basis. 
Using Awj^kS, below we present an efficient NB (ENB) multiplication algorithm 
over GF{2"^) for odd values of to. The corresponding algorithm for even values 
of TO is shown in [18]. Also, an efficient scheme to compute Awj^s is presented 
in [18]. 



Algorithm 3 (ENB Multiplication for to Odd) 

Input: A, B € GF{2"^), Awj^k G [0, to — 1], 1 < j < u, 1 < fc < hj, v = 

Output: G = AB 

1. Initialize G := Aq B, Sa ■= A, Sb ■= B 

2. C> 1 

3. For j = 1 to u { 
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4. S'a < 1, < 1 

5. Ta:= AqSb,Tb := BqSa 

6. R:=Ta + Tb 

7. For k = I to hj { 

8. i? > Awj^k 

9. C:=C + R 

10. } 

11 . } 

In the above algorithm, shifted values of A and B are stored in Sa and 
Sb, respectively. In line 6, R G GF(2’") contains (xqj, Xij, ■■■, Xm-ij), i.e., 
■ Also, right cyclic shift of R in lines 8, corresponds to 
Xij(3'^ . After the final iteration, C is the normal basis representation 

of the required product AB. To illustrate the operation of the above algorithm, 
we present the following example. 

Example 1. Consider the finite field GF{2^) generated by the irreducible poly- 
nomial F{z) = z^-l-z^-l-1 and let a be its root, i.e., F{a) = 0. We choose (3 = a®, 
then {/3, /3^, /3®, is a type 2 GNB. Here to = 5, and v = = 2. Using 

Table 2 in [12], one has 

6i= = P + hi = 2, [wi^k]kLi = [0, 3], 

62 = P^ = P^ + h2 = 2, K.fcjfcLi = [3, 4]. 

Let A = P“^ + P"^ + P^ = (OHIO) and B = P + P'^ + = (10101) be two field 

elements. Table 1 shows contents of various variables of the algorithm as they 
are updated. The row with j being is for the initialization step (i.e., line 1) 
of the algorithm. 

Table 1. Contents of variables in Algorithm 3 for mnltiplication of A = (OHIO) and 
B = (10101). 





Sa 


Sb 


Ta 


Tb 


k 


Awja 


R 


c 


- 


OHIO 


10101 


- 


- 


- 


- 


- 


00010 




11100 


01011 


01010 


10100 






11110 




1 










1 


0 


11110 


11100 












2 


3 


11011 


00111 




11001 


10110 


00110 


10001 






10111 




2 










1 


3 


11110 


11001 












2 


1 


01111 


10110 



As it can be seen in Algorithm 3, all Awj^s have to be pre-computed. In 
the above example, they are determined by calculating i5j’s, which is essentially 
a multiplication process all by itself. For this multiplication, one can use either 
Algorithm 1 or Algorithm 2. However, an efficient scheme which does not need 
multiplication is presented in [18]. 
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4.2 Cost and Comparison 

In an effort to determine the cost of Algorithm 3, we give the dynamic instruction 
counts for its software implementation. We also consider the number of memory 
accesses to read the pre-computed values of Awj^k- For software implementation 
of the above algorithm, one would heavily rely on instructions, such as, XOR, 
AND and others which can be used to emulate cyclic shifts (in the C like pro- 
gramming language). XOR instructions are needed in lines 6 and 9, which are re- 
peated V and times, respectively. Since v = and 

[10], the total number of XOR instructions is ^{Cn -|- to — 1) |"(^] . Because 
of the © operations in lines 1 and 5, one can also see that the above algo- 
rithm requires to |"™] AND instructions. We assume that each t-fold cyclic shift, 
l<i<TO— 1, in lines 2, 4 and 8 needs p |"(y] instructions where p is as de- 
fined earlier. In Algorithm 3, the number of cyclic shifts in lines 2, 4 and 8 are 
1, 2v and respectively. Thus, the total number of cyclic shifts in this 

algorithm is 1 -I- 2u -|- = ^{Cn + 2to — 1) and so the total number of 

instructions to emulate cyclic shifts used in Algorithm 3 is ^{Cn + 2m— 1) |"(^] . 
Based on the above discussion, we have the following theorem. 



Theorem 3. The dynamic instruction count for Algorithm 3 is given by 
Instructions '■ 



1 + P^ ,3-|-2p 2-l-p 

+ — — m — 



For software implementation of Algorithm 3, if the loops are not unrolled and 
the values of Awj^k’s are not hard-coded, one needs to store all these Awj^k, 1 < 
j ^ V, 1 < k < hj. Since the total number of Awj^kS is each 

Awj^k & [0, TO— 1] needs [log 2 to] bits of memory, a total of about [1082 m] 

bits of memory is needed to store the pre-computed Awj^s. 



Table 2. Comparison of multiplication algorithms in terms of number of instructions 
and memory requirements. 



Algorithms 


Instructions 


Memory 


XOR 


AND 


Others 


Size in bits 


Accesses 


Alg. 1 


(tm— 1) 


m (tm — 1) 


[ ?r] 


(tm — 1) ri°S2 


2m(tm — 1) 


Alg. 2 


(tm-l) [^] 


(tm-l) [ ^] 


2p (tm-l) [^] 


^ riog2 ™i 


tm 


Alg. 3 


1 (c„+m-2) [m] 


m [^] 


1 (Cw+2™-l) [m] 


©XzA pog 2 ml 


1 

0 


Ratio of 
Alg. 2 to Alg. 3 


2t 

~ t+1 


« t 


...... 4t 

~ t+1 


1 


« 2 



Table 2 compares the number of dynamic instructions of the three algorithms 
we have described so far. This table also gives memory sizes and numbers of mem- 
ory accesses of these algorithms. As it can be seen in Table 2, both our proposed 
schemes (i.e., Algorithms 2 and 3) are superior to the conventional bit-level 
multiplication scheme (i.e., Algorithm 1). The final row of Table 2 gives ap- 
proximate improvement factors of Algorithm 3 to Algorithm 2. A more detailed 
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comparison of these two algorithms are given in Table 3 for the five binary fields 
recommended by NIST for ECDSA (elliptic curve digital signature algorithm) 
[13]. We have also coded these algorithms in software using the C programming 
language. Table 3 also shows timing (in ns) for these codes executed on Pen- 
tium III 533 MHz PC^. Our codes are parameterized in the sense that they 
can be used for various m and t without major modifications. For high speed 
implementation, the codes can be optimized for special values of m and t. 

Agnew et. al. in [1] have proposed a bit-serial architecture for the NB mul- 
tiplication. Although their work has been targeted to hardware implementa- 
tion, the main idea can be used for software implementation similar to the 
vector level method proposed here. For such a software implementation of [1], 
one would require {Cn — 1) ["(((] XOR instructions, m |"(j] AND instructions, 
and p {Cn -|- m — 1) |"^] other instructions. Thus, the dynamic instruction count 
would be (p -I- 1) (Cat -I- m — 1) |"(j] which is about twice of that in Algorithm 3 
(see Theorem 3). In [19], one can find software implementation of the NB multi- 
plication for two special cases, namely, two optimal normal bases. The method 
used in [19] is similar to that of the NB multiplication of [1]. 

Some of the recently proposed polynomial basis multiplication algorithms, 
for example [6], [9], create a look-up table on the fly based on one of the inputs 
(say B) and yield signiflcant speed-ups by processing a group of bits of the other 
input (i.e.. A) at a time. At this point, it is not clear whether such a group-level 
processing of A can be incorporated into our Algorithm 3. However, if m is a 
composite number, then one can essentially achieve similar kind of group-level 
processing by performing computations in the sub-fields. This idea is explored 
in the following section. 



Table 3. Comparison of the proposed algorithms for binary fields recommended by 
NIST for ECDSA applications (uj = 32). 



Parameters 


Algorithm 2, 
Algorithm 3 


^ Instructions 


Memory 


Timing 


m 


t 


Cjv 


XOR 


AND 


Others / p 


Total (p = 4) 


Size in bits 


Accesses 


in ps 


Ratio 


163 


4 


645 


3906, 

2418 


3906, 

978 


7812, 

2910 


39060, 

15036 


2608, 

2576 


652, 

322 


307, 

99 


3.1:1 


233 


2 


465 


3720, 

2784 


3720, 

1864 


7440, 

3720 


37200, 

19528 


1864, 

1856 


466, 

232 


346, 

126 


2.75:1 


283 


6 


1677 


15273, 

8811 


15273, 

2547 


30546, 

10089 


152730, 

51714 


7641, 

7542 


1698, 

838 


1005, 

318 


3.16:1 


409 


4 


1629 


21255, 

13234 


21255, 

5317 


42510, 

15899 


212550, 

82147 


7362, 

7326 


1636, 

814 


1466, 

473 


3.1:1 


571 


10 


5637 


102762, 

55854 


102762, 

10278 


205524, 

61002 


1027620, 

310140 


28550, 

28180 


5710, 

2818 


8423, 

2949 


2.86:1 



^ The PC has 64 M bytes of RAM, 32 K bytes of LI cache and 512 K bytes of L2 
cache. 
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5 Efficient Composite Field NB Multiplication Algorithm 



In this section, we consider multiplications in the finite field GF(2’”) where m 
is a composite number. These fields are referred to as composite fields and have 
been used in the recent past to develop efficient multiplication schemes [14], 
[15]. When these fields are to be used for elliptic curve crypto-systems, one must 
choose m such that its factor are large enough to resist the attack described by 
Galbraith and Smart [4]. 

Lemma 3. [11] Let gcd(mi,m2) = 1. Let Ni = {l3f I 0 < j < mi — 1} be 
a normal basis of GF(2’”i) over GF(2). Then Ni is also a normal basis of 
Gi^(2™i™2) over GF{2"^^). 

Here, we consider composite fields with only two prime factors^ (be., both mi 
and m2 are prime). Thus, in the following we give all equations and algorithm for 
odd degrees (i.e., mi and m2). The reader can easily extend it for even degrees 
using the results of the previous section. Also, the parameters, namely Sj, hj, v, 
(3, and Awj^k of the previous section are used here in the context of the sub-fields 
G^(2"ii) and GF(2^^) by putting an extra sub/superscript for example for 
GA(2™i) and (5^^ for GF{2^^). 

Let A and B be two elements of GF{2'^^) over GF{2) and G be their product. 
Then we have the following from [17]. 



where 



mi — 1 



, ( 1 ) 

“j /mi-1 



32* , 



C — ^ ^ 1 / , 



VijPl 



( 1 ) 



, for mi odd 



i=0 



j = l k—l \ i—0 



Vi,j = (Oi + ai+j){bi + h+j), 1 < j < Ui, 0 < f < mi - 1, 



^(1) 



Vi = 



1 aA+l 












(14) 



By combining Lemma 3 with (14), the following is obtained. 

Lemma 4. Let A = (Aq, Ai, • • • , Am^-i) and B = {Bq, Bi, • • • , Bm^-i) be two 
elements of GF(2’”i’”2) over GF(2’”2) and G be their product. Then 



G= E Ai?./?r + EE E \ ,for m\ odd (15) 

i—0 j — 1 k—l \ i—0 / 



® This is important for elliptic curve crypto-systems. For such systems in today’s 
security applications, the values of m appear to be in the range of 160 to several 
hundreds only (571 as given in [13]). To avoid the attack of [4], one however may like 
to choose m such that it has no small factors such as 2, 3, 5, 7, 11. This basically 
makes one to choose m as the product of two primes. 




Fast Normal Basis Multiplication 



241 



where 

Yij = {A^ + Ai+j){Bi + Bi+j), I < j < vi, 0 < i < mi - I, (16) 

and Ai = (oi,o, • • • , ai^rn 2 -i), Bi = (6^,0, h,i, ■■■ , h,m 2 -i) G GF{2"^^) are 
sub-field coordinates of A and B. 

Lemma 4 leads to an algorithm for multiplication in composite fields using 
normal basis. The algorithm is stated below. 

Algorithm 4 (ECFNB Multiplication of over GE(2™^)) 

Input: A, B G GF(2™), Awj^l G [0,toi — 1], I < j < vi, vi = 1 — 

k < 

Output: G = AB 

1. Initialize C := A^ B, Sa ■= A, Sb ■= B 

2. For j = 1 to { 

3. Sa «C m2, Sb <C m2 

4. Ta:=A + Sa,Tb:=B + Sb 

5. R := Ta 0 Tb 

6. For fc = 1 to { 

7. i? > m2Awj^l 

8. C:=C + R ' 

9. } 

10. } 

In lines 1 and 5 of Algorithm 4, A B = {AqBq, AiBi,- ■ ■ , 
denotes parallel sub- field multiplications of A and B. This sub- field multiplica- 
tion can be implemented with an extension of Algorithm 3 such that it produces 
mi sub-field multiplications over GF{2™^). This is shown in Algorithm 5 where 
At> i (resp. A <3 i) 0 < i < m 2 — I, denotes an t-fold right (resp. left) sub-field 
cyclic shift of all sub-field elements of A, i.e., Aq, Ai, ■ ■ ■ , Ajn^_i, respectively. 

Algorithm 5 (Parallel Sub-Field Multiplication over GF{2"^^)) 

Input: A, B G GF(2”"), G [0,m2 - 1], 1 < j < "^ 2 , 1 < fc < hf \ V 2 = 

m 2 — 1 
2 

Output: C = A ® B 

1. Initialize G := Aq B, Sa ■= A, Sb ■= B 

2. G> 1 

3. For j = 1 to U2 { 

4. S'a <1 1, S'b <1 1 

5. Ta:=AqSb,Tb:=BqSa 

6 . R := Ta F Tb 

7. For fc = 1 to { 

8. R> Awfl 

9. C-.= G + R 

10. } 

11 . } 
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In order to obtain the cost of Algorithm 4, we need to evaluate the cost of 
Algorithm 5 which is called 1 + f times by the former. Like Algorithm 
3, one can determine the dynamic instruction counts of Algorithm 5 to be |(C *2 + 
m 2 — 2) XOR, m 2 AND and | (C 2 + 2m2 — 1) others to emulate cyclic shifts. The 
total cost of Algorithm 4 also depends on how sub-field elements, each of m 2 bits, 
are stored in registers. For the sake of simplicity we assume that an element of 
GF{2'^^) is stored in one w-bit register (for software implementation of elliptic 
curve crypto-systems with both mi and m 2 being prime, most general purpose 
processors would have tv bit registers where lo > m 2 )- For w = 24 and 32, the 
best values of m 2 are those which have ONBs, i.e., 23 and 29, respectively. Thus, 
each element of GF{2™) needs mi registers and the cyclic shifts in lines 3 and 
7 of Algorithm 4 are almost free of cost (or at best register renaming). Based 
on this assumption, we give the dynamic instruction counts of Algorithm 4 in 
Table 4. In this table, /i is the number of instructions needed for one sub-field 
cyclic shift in each register and it is 4 in the C programming language. 

Table 4. Cost of Algorithm 4. 



^ Instructions 


XOR 


(Cl + 2mi 3) -f (C 2 -t m 2 2) 


AND 


2 


Others 


i^(mi + l)(C2-f2m2-l) 


Memory 


Size in bits 


% \^og2mi] + % (log2m2l 


# Accesses 


Ci-l , (mi + l)(C2-l) 
2 ' 4 



Table 5 shows the number of instructions and memory requirements of Al- 
gorithm 4 for six different composite fields. These six fields are obtained by 
combining three mi’s and two m 2 ’s. Algorithm 4 is also coded for these com- 
posite fields using the C programming language. The actual timing (in ^s) of 
Algorithm 4 executed on Pentium III 533 MHz PC are also shown in Table 5. 



Table 5. Cost of Algorithm 4 for certain composite helds (/r = 4). 



Parameters 


# Instructions 


Memory 


Actual timing 
(in fjs) 


m 


mi 


m 2 


Cl 


C 2 


XOR 


AND 


Others 


Total 


Size in bits 


# Accesses 


299 


13 


23 


45 


45 


3445 


2093 


16380 


21918 


198 


176 


114 


377 




29 


55 


57 


4264 


2639 


20748 


27651 


228 


218 


150 


391 


17 


23 


81 


45 


6001 


3519 


27540 


37060 


310 


238 


188 


437 


19 


55 


117 


55 


7714 


4370 


34200 


46284 


400 


278 


249 


493 


17 


29 


81 


57 


7378 


4437 


34884 


46699 


340 


292 


242 


551 


19 


55 


117 


55 


9424 


5510 


43320 


58254 


430 


338 


309 
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6 Conclusions 

In this article, we have presented a number of software algorithms for normal 
basis multiplication over GF(2™). Both Algorithms 2 and 3 make maximal use 
of the full width of the data-path of the processor on which the software is to be 
executed and they provide significant speed-ups compared to the conventional 
bit-level multiplication scheme (i.e., Algorithm 1). Algorithms 2 and 3 are partic- 
ularly suitable if m is a prime. Such values of m are of importance, especially for 
designing high speed crypto-systems based on Koblitz curves and for protecting 
elliptic curve crypto-systems against the attack of Galbraith and Smart [4]. Both 
Algorithms 2 and 3 have been coded for software implementation using C, and 
our timing results show that Algorithm 3 is about 200% faster that Algorithm 
2. These results are for those five Gaussian normal bases over the binary fields 
which NIST has described in their EGDSA document [13]. For the purpose of 
using NIST parameters, although we have presented our results for Gaussian 
normal bases, our algorithms are quite generic and can be used for any normal 
bases of GF(2’") over GF{2). 

We have also considered composite fields with m = mi • m 2 - To avoid the 
attack of [4] on elliptic curve crypto-systems defined over these composite fields, 
we choose both mi and m 2 to be prime. We have presented an algorithm (i.e.. 
Algorithm 4) for normal basis multiplication for GF{2™) over GF(2’"^). Our 
results show that for similar values of m. Algorithm 4 can be much more efficient 
than Algorithm 3. For example, the actual timing of Algorithm 3 is 318 micro- 
seconds for GF(2^®^) whereas the timing of Algorithm 4 is 114 micro-seconds 
for GF(2^®®). Gomposite fields also provide an added flexibility to hardware- 
software co-design of finite field processors. For example. Algorithm 5 which is 
called by Algorithm 4 a total of times, can be implemented in hardware 
for small values of m 2 , and Algorithm 4 can be embedded in a micro-controller 
which would give us a high speed, yet quite flexible, normal basis multiplier over 
very large fields. 
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Abstract. A new method for multiplication of large integers and de- 
signed for efficient software implementation is presented and compared 
with the well-known “schoolbook” method that is currently used for 
both software and hardware implementations of public-key cryptographic 
techniques. The comparison for the software-efficient method is made in 
terms of the required number of basic operations on small integers. It 
is shown that a significant performance gain is achieved by the new 
software-efficient method for integers from 192 to 1024 bits in length, 
which is the range of interest for all current public-key implementa- 
tions. For 1024-bit integer multiplication, the savings over the schoolbook 
method is conservatively estimated to be about 33%. A new method for 
multiplication of large integers, which is analogous to the new software- 
efficient method but is designed for efficient hardware implementation, 
is also presented and compared to the schoolbook method in terms of 
the number of processor clock cycles required. 



1 Introduction 

Multiplication of large integers plays a decisive role in the efficient implemen- 
tation of all existing public-key cryptographic techniques such as the Diffie- 
Hellman and the elliptic-curve key-agreement protocols and the Rivest-Shamir- 
Adelman (RSA) cryptosystem. The standard “schoolbook” method of multi- 
plication is today the most used method for integer multiplication in practical 
public-key systems, cf. pp. 630-631 in [1]. For very large integers beyond the range 
of practical interest in current cryptographic systems, more efficient methods of 
multiplication are known, cf. [2]. One of these methods that has some practical 
significance is that due to Karatsuba and Ofman [3], which reduces the asymp- 
totic complexity of multiplying two N-bit integers to bit operations 

compared to O(fV^) bit operations for the schoolbook method. 

The main contribution of this paper is a new software-efficient method of mul- 
tiplication that improves on the schoolbook method when used in any current 
public-key cryptographic application. Section 2 provides a brief description of 
the schoolbook and the Karatsuba-Ofman methods. In Section 3 we introduce 
the new software-efficient method of multiplication and compare its complex- 
ity with that of the schoolbook method. We also provide explicit performance 
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figures for the new soft ware-efficient method and for the schoolbook method 
for integers in the range from 192 to 1024 bits in length, which is the range 
of interest for all current public-key techniques. In Section 4 we introduce a 
new hardware-efficient method of multiplication, which is analogous to the new 
software-efficient method, and we compare its complexity in hardware with that 
of the schoolbook method. 

2 Schoolbook and Karatsuba-Ofman Methods 

Let /3 = 2™ be the radix in which integers are represented for calculation. Nor- 
mally, w is the word size in bits of the processor on which the algorithm is 
implemented. By an n-symbol integer, we will mean an integer between 0 and 
/3” — 1 inclusive, i.e., an integer that can be written as an n-place radix-/? integer. 
Note that a symbol is a w-bit integer and that an n-symbol integer is an N-bit 
integer where N = nw. 

Let A = (a„_i, a„_ 2 , ..oo) and B = (6„_i, &„_ 2 , ..^o)) where Oj and bi are 
ru-bit integers, be two n-symbol integers. The result of their multiplication is 
the 2n-symbol integer A ■ B where 

n— 1 n— 1 

a-b = j 2 oi/ 3 * 

i—O j—0 



or, equivalently, 

n— 1 n— 1 

= ( 1 ) 

i=0 j=0 

The schoolbook method of multiplication computes A ■ B essentially by carry- 
ing out the multiplications of w-bit integers in (1), one for each of the 
terms, and adding coefficients of like powers of /?. The schoolbook method thus 
requires n^ multiplications of w-bit integers to calculate the product of two n- 
symbol integers. The precise order in which the multiplications and additions 
are carried out will not concern us here, but this order affects the “overhea d” 
in implementing the schoolbook method. 

For counting the number of additions required by the schoolbook method, it 
is convenient first to write the 2w-bit integer Oi • bj in (1) as Cijfi + dij where 
Cij and dij are ic-bit integers. Then (1) can be written as 



n— 1 n— 1 
i—0 j=0 

n—1 n—1 n—1 n—1 

= EE EE (2) 

j=0 i—0 j—0 



We note that there are only 2n distinct powers of /? among the 2n^ terms in (2). 
Because each addition of coefficients of some power of (3 reduces the number of 
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terms by one, it follows that exactly 2n^ — 2n = 2n{n — 1) additions of w-bit 
integers are required to add the coefficients of like powers of (3 in (2). Thus, the 
schoolbook method requires 2n{n — 1) additions of w-bit integers to calculate 
the product of two n-symbol integers. The additions of the terms in (2) with 
coefficients Cij are called “carry additions” because these terms originate from 
the “overflow” into the next higher w-bits when two zc-bit integers are multiplied. 
Of the 2n{n — 1) additions of zc-bit integers required by the schoolbook method, 
exactly half are such carry additions. Finally, we note that each addition of zz'-bit 
integers can result in a bit carry to another zc-bit integer, which increments this 
latter integer by 1. The schoolbook method requires a maximum of 2n{n — 1) 
such carry-bit additions. 

The Karatsuba-Ofman method [3] is a divide-and-conquer technique for com- 
puting the components of C = A ■ B based on the following observation. Sup- 
pose that A and B are n-symbol integers where n = 2*. Let A = 0^ Ai + Aq 
and B = 0^ B\ + B^ where Aq, Ai, Bq and Bi are 2*“^-symbol integers. 
Then A ■ B = C2/3^* + ^ -I- Cq, where Co = Aq ■ Bq, C 2 = A\ ■ B\, and 

Cl = (Aq + Ai) • {Bo + Bi) — Co — C 2 . It follows that C = A- B can be computed 
by performing three multiplications of 2*“ ^-symbol integers together with two 
additions and two subtractions of such integers. This procedure is iterated con- 
ceptually t times, i.e., until the integers reach the size of one symbol (zc-bits), at 
which point the multiplications and additions are actually performed. This algo- 
rithm requires only 3* « j^i-585 multiplications of zc-bit integers, compared to n^ 
such multiplications for the schoolbook method. Combining Karatsuba-Ofman 
algorithm with schoolbook multiplication may have some practical significance. 
However, the recursive nature of the Karatsuba-Ofman algorithm results in such 
a significant overhead that its direct application to integers of the size used in 
current public-key cryptography is not efficient, cf. pp. 630-631 in [1]. 



3 A Software-Efficient Multiplication Method 

3.1 The Underlying Idea 

Our new software-efficient multiplication method is based on the formula 

n—lu—1 n—1 n—1 n— 1 

A-B = ^ ^(a„+a,).(6„+&„)/3“+”+2 ^ ^ (3) 

u—1 t?=0 n=0 t;=0 u—0 

We will use the same notation here as we used for the schoolbook method except 
that we will write the radix as (3 = 2^ rather than as /3 = 2™ for a reason that 
will become apparent in Subsection 3.2. 

It is easy to check by multiplying out and combining terms that (3) gives 
the correct result for multiplication. We note here for future use that n{n— l)/2 
additions of IF-bit integers are required to form the coefficients a„-|-a„ in (3) and 
another n(n— 1) /2 such additions are required to form the coefficients bu + by. To 
facilitate the counting of multiplications and further additions of IT-bit integers 
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needed to implement the multiplication formula (3), it is convenient to write 

where is the carry bit and a®™ is the least significant W^-bits of the sum 
of the bb-bit integers and a^. Using analogous notation for the sum of the 
bb-bit integers and by, we can write (3) in the manner 

n— 1 u—1 

A- B=Y, • / 3 “+" 

U—1 v—0 

n—1 71—1 n—1 

+2 ^ a„ • 6„/?2“ 

v—0 

or, equivalently, 

n— 1 u — 1 

u—1 v—0 
n— 1 u — 1 

+ E E(^t<T + 

u—1 v—0 
n— 1 u — 1 

u—1 v—0 
n—1 

+2 E! 

u—0 

n—1 n—1 

( 4 ) 

v—0 n— 0 

The only multiplications of bb-bit integers occur within the third, fourth 
and fifth lines in (4). Each of the (”) = jjj third line requires 

one such multiplication. Each of the n terms within the sum on u in the fourth 
line also requires one such multiplication and these are the same products as 
are required in the fifth line. Thus, to implement the multiplication formula (3) 
requires a total of + n = multiplications of bb-bit integers, which 

we note is about half that required by the schoolbook method when we choose 
bb = w as is required for a direct comparison. 

In counting additions of bb-bit integers, we consider the worst case where all 
the carry bits and b‘^y are equal to 1. It is again convenient to write the 
2bb-bit integer o„ • in (4) as c„/3 -I- where c„ and are bb-bit integers, 
and to write a™™ • 6™™ in (4) as c®™/3 -b d®™ where c®™ and are bb-bit 
integers. We can then rewrite (4) for this worst case as 

n— 1 u — 1 

A • s = E E 

u=l v—0 
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n—lu—1 n— lu— 1 

+ E E + E E 

u—l v—0 u—1 v—0 

n—lu—1 n—lu—1 

+ E E + E E 

u—l v—0 u—l v—0 

n—1 n—1 

+ ^^(c« + + 'y^{du + du)0^^ 

u—0 li— 0 

n n—1 

li— 0 u—0 

with the convention that Cj = dj = 0 for j < 0 and for j > n. Upon setting 

e-u = Cu-i+du and then m = f — r; in the last line, we can rewrite this equivalently 

as 



A- 

U—l V—0 

n—lu—1 n—lu—1 

+ E E + E E 

u—l v—0 u—l v—0 

n—lu—1 n—lu—1 

+ E E + E E 

u—l v—0 u—l v—0 

n—1 n—1 

+ E](‘'“ + '^^{du + du)0^ 

U—0 

2n—l n—1 



n=0 



( 5 ) 



u=0 



The terms Cj = Cj_i -I- di for i = 0, 1, . . . , n require n—1 additions of lU-bit 
integers for their formation because the terms for i = 0 and i = n, namely do 
and Cn-i respectively, require no additions. We next consider the number of 
additions of lU-bit integers required to form the coefficients 

n—1 

Si = 6i-v for z = 0, 1, ... , 2n — 1 

U—0 

that appear in the fifth line of (5). We observe that we can rewrite this sum 
separately over two ranges of the index as 



and 



Si = ^ 6i = Si-i + 6i for z = 0, 1, ... n—1 
j=o 

i 

S 2 n—l —2 — ^ ^ ^n—j — ^2n—i “t” ^n—i foi" Z = 0, 1, ... Tl 1. 
j=0 



(6) 



( 7 ) 
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where we have taken sq = S 2 n = 0. We see that n — 2 additions of W-bit integers 
are required to form the nested sums in (6) and another such n — 2 additions 
are required to form the nested sums in (7). Hence a total of 3 (n — 1) additions 
of W-bit integers are required to form all the coefficients in the fifth line of (5). 

The summation in the first line of (5) concerns only carry bits, which we will 
consider later. There are n{n — 1) terms in the second line, another n(n — 1) 
terms in the third line, 4n terms in the fourth line, and 2n terms in the fifth 
line — a total of 2n? + An terms. But there are only 2n distinct powers of [3 in 
(5) so that 2n^ + 4n — 2n = 2n^ + 2n additions of W-bit numbers are required 
to combine the like powers of (3. To this, we must add the n{n — 1) additions 
of W-bit integers required to form the coefficients a„ + and 6„ + in (3) as 
well as the 3(n — 1) additions required to form the coefficients in the last line 
of (5). This gives a total of 3n^ + 4n — 3 additions of W-bit integers required 
to implement the multiplication formula (3). We note that this is greater by 
a factor of about | than the 2n{n — 1) additions required by the schoolbook 
method when we choose W = rc as is required for a direct comparison. 

Finally, we note that the first line of (5) specifies \n{n — 1) additions of carry 
bits (in this worst case) and each of the 3n^ +4n — 3 additions of VF-bit numbers 
can also result in a carry bit. Thus, to implement the multiplication formula (3) 
requires a maximum of |n(n + 1) — 3 carry-bit additions. 

3.2 Achieving Efficiency 

As we have just seen, the direct implementation of formula (3) for multiplication 
of nW-bit integers requires only about half as many multiplications, but about 
50% more additions, of kF-bit integers compared to the schoolbook method 
when we take W = w. To convert the multiplication formula (3) into an efficient 
method for multiplication of A-bit integers on a w-bit processor, we first set 
N = nsw and then split the problem of multiplication into (1) the problem of 
multiplying A-bit integers using a virtual processor with word size W = sw, 
followed by (2) the problem of implementing the necessary multiplications and 
additions of kF-bit integers using the actual processor with word size w. We solve 
the first problem by implementing the multiplication formula (3), after which we 
solve the second problem by implementing the necessary multiplications of sw-bit 
integers by the schoolbook method. We now count the number of multiplications 
and additions of w-bit integers required by this “hybrid method” for multiplying 
A-bit integers. 

As was shown in Subsection 3.1, the multiplication of A-bit integers, where 
A = nkF, according to the multiplication formula (3) requires multiplica- 

tions of kF-bit integers where W = sw. Each such multiplication when performed 
by the schoolbook method requires multiplications and 2s(s — 1) additions of 
w-bit numbers, as well as 2s(s — 1) carry-bit additions. Thus, the multiplications 
performed in the first step of the hybrid method result intal numbers of w-bit 
integer operations shown in the following table: The multiplication of A-bit 
integers, where A = nkF, according to the multiplication formula (3) requires 
3n^ -k 4n — 3 additions of kF-bit integers where kF = sw. Each such addition is 
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multiplications 


additions 


carry-bit additions 


|s^n(n -1- 1) 


s(s — l)n(n + 1) 


s{s — l)n{n 1) 



equivalent to s additions of w-bit integers and s carry-bit additions. Thus, the 
the additions performed in the first step of the hybrid method result in the total 
numbers of zc-bit integer operations shown in the following table: Finally, the 



multiplications 


additions 


carry-bit additions 


0 


(3n^ -1- 4n — 3)s 


(3n^ -1- 4n — 3)s 



multiplication of iV-bit integers, where N = nW , according to the multiplica- 
tion formula (3) requires in the worst case |n(n -I- 1) — 3 carry-bit additions for 
VF-bit integers where W = sw. Each such carry-bit addition for sic-bit integers 
is equivalent to a single carry-bit addition for ic-bit integers so that the carry- 
bit additions performed in the first step of the hybrid method result in the total 
numbers of ic-bit integer operations shown in the following table: 



multiplications 


additions 


carry-bit additions 


0 


0 


\n{n -1- 1) — 3 



Tallying the counts in the three previous tables gives the figures shown in 
the following table: 

For ease of comparison, we include here the table of counts for schoolbook 
method as calculated in Section 2. 

3.3 Numerical Examples 

Example 1: Consider the multiplication of 1024-bit integers [where we note that 
1024 is a length commonly used for current implementations of the RSA cryp- 
tosystem and of the Diflie-Hellman key agreement protocol] on a processor with 
word size ic = 16 bits. As a basis for comparison, we assume that one 16-bit 
addition constitutes 1 unit of computation as also does one carry-bit addition, 
but that one 16-bit multiplication constitutes 2 units of computation. 

The specifications N = nsw = 1024 and ic = 16 give ns = 64 and hence 
the allowed values of (n, s) are (1,64), (2,32), (4,16), (8,8), (16,4), (32,8) and 
(64, 1). Calculating the cost for each of these choices with the aid of the values 
in Table 1 shows that the choice n = s = 8 yields the minimum cost of 16,457 
computational units for the new software-efficient method, but the choice n = A 
and s = 16 is nearly as good with a cost of 16,739 units. For the choice n = 
s = 8, the number of multiplications, additions and carry-bit additions are 2304, 
5800 and 6049, respectively. By comparison, we calculate from Table 2 that the 
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Table 1. Total counts of w-bit integer operations for the software-efficient multiplica- 
tion method for nsui-bit integers 



multiplications 


additions 


carry-bit additions 


\s^n{n -\- 1) 


s[(s -1- 2)n^ -1- (s -1- 3)n — 3] 


s[(s -1- 2)n^ -1- (s -1- 3)n — 3] 
+ \n{n + 1) — 3 



Table 2. Total counts of w-bit integer operations for the schoolbook multiplication 
method for nsui-bit integers 



multiplications 


additions 


carry-bit additions 


{snf 


2sn(sn — 1) 


2sn(sn — 1) 



schoolbook method has a cost of 24,320 computational units arising from the 
4096 multiplications, 8064 additions, and 8064 carry-bit additions that must 
be performed. In this example, the new software-efficient multiplication method 
uses about one-third less computation than does the schoolbook method. 

Example 2: Consider the multiplication of 192-bit integers [which is one of the 
lengths for the Elliptic curve system recommended for the FIPS 186-2 standard] 
on a processor with word size w = 8 bits. Again we assume that one 8-bit 
addition or one carry-bit addition constitutes 1 unit of compution, but that one 
8-bit multiplication constitutes 2 units of computation. 

The specifications N = nsw = 192 and w = 8 give ns = 24 and hence the 
allowed values of (n,s) are (1,24), (2,12), (3,8), (4,6), (6,4), (8,3), (12,2) and 
(24, 1). Calculating the cost for each of these choices with the aid of the values 
in Table 1 shows that the choice n = 4 and s = 6 yields the minimum cost of 
2719 computational units for the new software-efficient method, but the choice 
n = 3 and s = 8 is virtually as good with a cost of 2727 units. For the choice 
n = 4 and s = 6, the number of multiplications, additions and carry-bit additions 
are 360, 966 and 1033, respectively. By comparison, we calculate from Table 2 
that the schoolbook method has a cost of 3360 computational units arising from 
the 576 multiplications, 1104 additions, and 1104 carry-bit additions that must 
be performed. In this example, the new software-efficient multiplication method 
uses about 19% less computation than does the schoolbook method. 

It should be pointed out that actual performance results for the new software- 
efficient multiplication method may well he substantially better than predicted by 
our analysis, which was made using worst-case assumptions. For instance, our 
8-bit implementation of the new software-efficient multiplication method on a 
Pentium 2 processor for the parameters of Example 2 actually used about 40% 
less computation than did the schoolbook method, rather than only 19% less as 
our analysis had predicted. 
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4 A Hardware-Efficient Multiplication Method 

The following formula, analogous to (3), is the basis for our new hardware- 
efficient method of multiplication: 

n— 1 n— 1 n—lu—1 

A- B=(Y, PliY. + E E («“ - “«) • (8) 

= 0 U—0 U — 1 V—0 

The complexity of implementing multiplication according to (8) is comparable to 
that for implementing multiplication according to (3) . Which method is superior 
depends on the computational environment. For example, using (8) will give 
fewer carry-bit additions but will require sign checks. In general the use of (8) 
is better suited to hardware implementations and therefore we now analyze the 
use of (8) in a hardware implementation by estimating the number of clock 
cycles needed to multiply two Wbit numbers A and B. We will compare this 
performance to a hardware implementation of the schoolbook method using the 
shift-and-add technique, which requires N clock cycles when all N bits can be 
processed in parallel. 

Let N = nW and consider the multiplication formula (8) where and bi are 
W-hit numbers. Calculating all n(n— 1) required differences (ai — aj) and (bi — bj) 
in the second double summation of (8) requires = n — 1 clock cycles if 

the same resources as for the schoolbook method are used. Formula (8) requires 
multiplications of VF-bit numbers. Because n such multiplications can be 
performed in parallel, another cycles are needed. Summing the 

results of these multiplications requires in the worst case an additional 
clock cycles. The total number of clock cycles required for the multiplication 
A ■ B is thus 

+ ( 9 ) 

which is about half that required by the schoolbook method for large N = nW. 

Example 3: Suppose that A and B are 1024 bit numbers and consider the choice 
W = 128 and n = 8. Multiplication according to (9) requires at most 683 clock 
cycles compared to 1024 clock cyles for the schoolbook method using the shift- 
and-add technique, a reduction of 33%. 

5 Conclusion 

The analyses of the new software-efficient multiplication method and of the new 
hardware-efficient multiplication method both show that a significant perfor- 
mance improvement over the schoolbook method can be obtained for all current 
applications in public key cryptography. Moreover, our complexity estimates for 
the new methods are conservative-actual gains can exceed those predicted, as 
was pointed out in Example 2. 
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Abstract. We propose a new method to compute r-coordinate of kP + 
IQ simultaneously on the elliptic curve with Montgomery form over IFp 
without precomputed points. To compute r-coordinate of kP -\- IQ is re- 
quired in ECDSA signature verification. The proposed method is about 
25% faster than the method using scalar multiplication and the recovery 
of T-coordinate of kP and IQ on the elliptic curve with Montgomery 
form over IFp, and also slightly faster than the simultaneous scalar mul- 
tiplication on the elliptic curve with Weierstrass form over IFp using 
NAF and mixed coordinates. Furthermore, our method is applicable to 
Montgomery method on elliptic curves over ]F 2 n. 



1 Introduction 

Elliptic curve cryptography was first proposed by Koblitz [10] and Miller [15]. In 
recent years, efficient algorithms and implementation techniques of elliptic curves 
over Fp [3,4], F 2 « [7,13] and Fp« [2,9,12] has been investigated. In particular, 
the scalar multiplication on the elliptic curve with Montgomery form over Fp 
can be computed efficiently without precomputed points [16], and is immune to 
timing attacks [11,17]. This method is extended to elliptic curves over F 2 » [14]. 

We need to compute kP + IQ, where P and Q are points on the elliptic 
curve and k, I are integers less than the order of the base point, in Elliptic 
Curve Digital Signature Algorithm (ECDSA) signature verification [1]. On the 
elliptic curve with Weierstrass form, kP + IQ can be efficiently computed by a 
simultaneous multiple point multiplication [3,7,19], which we call a simultaneous 
scalar multiplication. On the other hand, the simultaneous scalar multiplication 
on the elliptic curve with Montgomery form has not been proposed yet. Then 
we propose it and call it Montgomery simultaneous scalar multiplication. This 
method is about 25% faster than the method using Montgomery scalar multi- 
plication and the recovery of E-coordinate of kP and IQ, and about 1% faster 
than Weierstrass simultaneous scalar multiplication over Fp using NAF [8] and 
mixed coordinates [4]. Moreover, our method is applicable to elliptic curves over 
F 2" . 

This paper is described as follows. Section 2 presents preliminaries including 
arithmetic over the elliptic curve with Montgomery form and Weierstrass form. 
In Section 3, we describe the new method, Montgomery simultaneous scalar 
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multiplication. Section 4 presents comparison of our method with others and 
Section 5 presents implementation results. We then apply our method to elliptic 
curves over IF 2 »» in Section 6 and conclude in Section 7. 

2 Preliminaries 

2.1 Elliptic Curve with Montgomery Form 

Montgomery introduced the new form of elliptic curve over IFp [16]. For A,B £ 
IFp, the elliptic curve with Montgomery form Em is represented by 

Eu-By^ = x^ + Ax^ + x -A)B^Q). (1) 

We remark that the order of any elliptic curve with Montgomery form is always 
divisible by 4. 

In affine coordinates (x, y), the x-coordinate of the sum of the two points on 
Em can be computed without the y-coordinates of these points if the difference 
between these points is known. Affine coordinates {x, y) can be transformed into 
projective coordinates (AT, Y, Z) by x = XjZ, y = Y/Z. Equation (1) can also be 
transformed as 

Em ■■ BY'^Z = + AX"^Z + XZ^. 

Let Po = {Xo,Yq, Zq) and Pi = (Xi,Yi,Zi) be points on Em and P2 = 
(X2,Y2,Z2) = Pi + Po, -P 3 = (X3,Ys,Z3) = Pi - Pg. Addition formulas and 
doubling formulas are described as follows. 

Addition formulas P 2 = Pi + Po (Pi yf Po) 

A2 = Z3((Xo - Zo)(Xi + Zi) + (Xo + Zo)(Xi - Zi)f 

Z2 = X3((Xo - Zo)(Xi + Zi) - (Xo + Zo)(Xi - Zi)f (2) 

Doubling formulas P 2 = 2Pq 



4XoZo = (Xo + Zof - (Xo - Zof 
X2 = (Xo + ZoY(Xo - Zof 
Z2 = (AXoZo)((Xo - Zof + ((A + 2)/A)(AXoZo)) 

P 2 = (^ 2 ,^ 2 ) can be computed without E-coordinate. Since the computational 
cost of a field addition and subtraction is much lower than that of a field multi- 
plication and squaring, we can ignore it. The computational cost of the addition 
formulas is 4M -|- 25, where M and S respectively denote that of a field multipli- 
cation and squaring. If Z 3 = I, the computational cost of the addition formulas 
is 3M -1-25. If (A-|-2)/4 is precomputed, the computational cost of the doubling 
formulas is also 3M -|- 25. 

Let (kt ■ • • k\ko )2 be the binary representation of k with fct = 1. To compute 
the scalar multiplication kP from P = (x, y), we hold {rriiP, (mi + l)P} for rm = 
(kf-- ki) 2 - If ki = 0, rriiP = 2mi+iP and (rrii + I)P = (rrii+i + I)P -I- rrii+iP. 
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Otherwise, niiP = {mi+i + l)P + mi+iP and {mi + l)P = 2(mi+i + l)P. We can 
compute {kP, (/c+l)P} from {P, 2P}. Montgomery scalar multiplication requires 
the addition formulas t — 1 times and the doubling formulas t times. Since the 
difference between {nn+i + 1)P and rm+iP is P, we can assume (^3,^3) = 
(x, 1) in addition formulas (2). The computational cost of Montgomery scalar 
multiplication kP is (6|A:| — 3)M + {4\k\ — 2)5, where |A:| is the bit length of k. 

In ECDSA signature verification, we need to compute x-coordinate of kP + 
IQ, where P, Q are points on the elliptic curve and k, I are integers less than 
the order of the base point. kP and IQ can be computed using Montgomery 
scalar multiplication, but kP + IQ cannot be computed from kP and IQ using 
formulas (2) because the difference between kP and IQ is unknown. Therefore, 
the recovery of M-coordinate of kP and IQ is required to compute kP + IQ from 
kP and IQ using other addition formulas. The method of recovering M-coordinate 
is described in [18]. If kP = {Xq,Zq), {k + 1)P = {Xi,Zi) and P = (x,y), we 
can recover T-coordinate of kP = {X, Y, Z) as: 



A = 2ByZoZiXo 

Y = Zi{{Xq + xZq + 2AZq){Xqx + Zq) — 2AZq^) — (Aq — xZq)'^Xi 
Z = 2ByZoZiZo 

The computational cost of recovering Y-coordinate is 12M + S. 

To compute a;-coordinate of kP + IQ, we require these 6 steps. 

Stepl Compute kP using Montgomery scalar multiplication 
Step2 Recover Y-coordinate of kP 

Step3 Compute IQ using Montgomery scalar multiplication 
Step4 Recover Y-coordinate of IQ 

Steps Compute kP + IQ from kP and IQ in projective coordinates 
Step6 Compute x-coordinate of kP + IQ using x = XjZ 

The computational cost of StepS is lOM -|- 25 and that of Step6 is M Y I, 
where I denotes that of a field inversion. We can assume jfcj = jlj without loss of 
generality. The computational cost of a;-coordinate of kP + lQ is (12|A:| -|-29)M -I- 
8\k\S + I. 

2.2 Simultaneous Scalar Multiplication on Elliptic Curve 
with Weierstrass Form 

For a,b G Fp, the elliptic curve with Weierstrass form Ew is represented by 

Ew ■ y^ = x^ + ax + b (4a^ -I- 27b^ yf 0). 

We remark that all elliptic curves with Montgomery form can be transformed 
into Weierstrass form, but not all elliptic curves with Weierstrass form can be 
transformed into Montgomery form. 

kP + IQ can be computed simultaneously on the elliptic curve with Weier- 
strass form without precomputed points [19]. This method is known as Shamir’s 
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trick [5]. On the elliptic curve with Weierstrass form over Fp, the most effec- 
tive method computing kP + IQ without precomputed points is the simulta- 
neous scalar multiplication using non-adjacent form (NAF) (fc^, • • • where 

fc- G {0,±1} (0 < i < t'), and mixed coordinates [4]. In [3], Weierstrass si- 
multaneous scalar multiplication using window method and mixed coordinates 
is described. This method is faster than Weierstrass simultaneous scalar mul- 
tiplication using NAF, but requires much more memories where the points are 
stored. That is why we pick Weierstrass simultaneous scalar multiplication using 
NAF and mixed coordinates in this section. 

NAF has the property that no two consecutive coefficients fc' are non-zero 
and the average density of non-zero coefficients is approximately 1/3. In mixed 
coordinates, we use the addition formulas of J™ ^ J + A for the additions, 
the doubling formulas of J ^ 2 J™ for the doublings ahead of addition, and the 
doubling formulas of J™ ^ 2J™ for the doublings ahead of doubling, where 
J, A, and respectively denote Jacobian coordinate, affine coordinate, and 
modified Jacobian coordinate. This method is described as follows. 



Algorithm 1: Weierstrass Simultaneous Scalar Multiplication using NAF and 
mixed coordinates 

Input: k= {kf- kiko) 2 , 1 = {If- hlo) 2 , P,Q & Ew {h or k = 1). 

Output: x-coordinate of W = kP + IQ. 

1. Compute P -I- Q, P — Q. 

2. Let {k{, ■ ■ ■ and (1/ • • • 1[Iq) be NAF of k and I {k[, or l[, = 1). 

3. W <^k[,P + l[,Q. 

4. For i from t' — 1 downto 0 do 

4.1 if {k[,l[) = (0,0) then 

W^2W ( J™ ^ 2J™); 

4.2 else then 

W^2W {J^2J^), 

W +{k[P + l[Q) (J™^J-kA). 

5. Compute x-coordinate of W. 



At step 1, P + Q, P — Q are computed in affine coordinates and their com- 
putational cost is 4M + 2S + 1. In mixed coordinates, the computational cost of 
the addition formulas of J™ ^ J + A, the doubling formulas of J ^ 2 J™, and 
the doubling formulas of J™ ^ 2 J™ are respectively 9M + 55', 3M + 45, and 
4M-I-45. Since the probability that (fc', l{) = (0, 0) is (1 — 1/3)^ = 4/9, we repeat 
step 4.1 4|fc|/9 times and step 4.2 5|/c|/9 times if t' = t + 1. The computational 
cost of step 4.1 is (4|A:|/9) • (4M-I-45) and that of step 4.2 is (5|/c|/9) • (12M-I-95). 
This shows that the computational cost of step 4is76|A:|/9-M-|-61|fc|/9-5. The 
computational cost of step 5 is M-l-5-1-/ by x = XjZ"^ . Therefore, the computa- 
tional cost of x-coordinate of kP+lQ is (76|fc|-|-45)/9-M-|-(61|fc|-|-27)/9-5-|-2/. 
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3 Proposed Method — 

Montgomery Simultaneous Scalar Multiplication 

Now we propose the new method to compute kP + IQ simultaneously on the 
elliptic curve with Montgomery form over IFp. 

At first, we define a set of four points Gi, 

{ rriiP + n^Q, 
rriiP + (ui + 1)(5, 

(wi + 1)P + riiQ, 

{nii + 1)P + (n* + l)Q 

for nii = {kt • • ■ ki) 2 , rii = (It • • ■ li) 2 - Now, we present how to compute Gi from 
Gi+i in every case of {ki,k). 

1 . ih,k) = ( 0 , 0 ) 

Since rrii = 2mi^i and rii = 2rii^i, we can compute all elements of Gi from 
Gj+i as: 

m*P + UiQ = 2{rrii+iP + rii+iQ) 

niiP + {ui + 1)Q = (wi+iP + (rii+i + 1)Q) + {rrn+iP + Ui+iQ) 

{rrii + 1)P + UiQ = ((wi+i + 1)P + rn+iQ) + {rrn+iP + rn+iQ) 

{mi + 1)P + {rii + 1)Q = ((rrii+i + 1)P + rii+iQ) + (wi+iP + (rii+i + 1)Q) 

All elements of Gi can be computed without (m^+i + 1)P + (rii+i + l)Q G 
G,+i. 

2. {h,k) = (0,1) 

Since mi = 2mi+i and Ui = 2ni+i + 1, we can compute all elements of Gi 
from Gi+i as: 

miP + TiiQ = (m*+iP + (n*+i + 1)Q)) + (wi+iP + ni+iQ) 

TOiP + {rii + 1)Q = 2{nii+iP + (rii+i + 1)Q) 

{mi + 1)P + UiQ = ((rui+i + 1)P + (rii+i + 1)Q) + (wi+iP + rii^iQ) 
{mi + 1)P + {rii + 1)Q = {{mi+i + 1)P + (rii+i + 1)<5) 

+ {mi+\P + {rii+i + 1)<5) 
All elements of Gi can be computed without (rui+i + 1)P + tii+iQ G Gi+\. 

3. {h,li) = (1,0) 

Since mi = 2mi+i + 1 and rii = 2ni+i, we can compute all elements of Gi 
from Gi+i as: 

rUiP + riiQ = {{mi+i + 1)P + rii+iQ) + (wi+iP + rii+iQ) 

TOiP + {rii + 1)Q = {{m^+l + 1)P + (rii+i + 1)Q) + (rUi+iP + rii+iQ) 
{mi + 1)P + riiQ = 2((TOi_|_i + 1)P + rii+iQ) 

{mi + 1)P + {rii + 1)Q = ((wi+i + 1)P + (rii+i + 1)Q) 

+ ((TO-i+i + 1)P + rii+iQ) 
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All elements of Gi can be computed without rrii+iP + (rii+i + 1)Q € Gi+i. 
4. ih,k) = (1,1) 

Since rrii = 2mi+i + 1 and = 2rii+i + 1, we can compute all elements of 
Gi from Gi+i as: 



rriiP + UiQ = {{nii+i + 1)P + rn+iQ)) 

+ {TTii+iP + {rii+i + 1)<5) 

rriiP + {rii + 1)Q = ((mj+i + 1)P + {rii+i + 1)<5) 

+ {lTli+lP + (^i+l + 1)Q) 

{nii + 1)P + riiQ = ((rrij+i + 1)P + {rii+i + 1)<5) 

+ ((wi+i + 1)P + rii+iQ) 

(rrii + 1)P + (rii + 1)Q = 2((mi+i + 1)P + (rii+i + 1)Q) 

All elements of Gi can be computed without rrii+iP + rii+iQ G Gi+\. 

In every case, all elements of Gi can be computed from Gi+i without {rrii+i + 
1 — ki)P + {rii+i + 1 — li)Q G Gi+i- When we define a set of three points G', 

G'i = Gi — {{rrii + 1 — + (n^ + 1 — li-i)Q}, (4) 

all elements of Gi can be computed from G{_^^. Therefore, we can compute G' 
from G'_|_]^. The way to compute G' fro m G{_^i depends on {ki,li,ki-i,li-i), 
since computing Gi from G{_^_i depends on {ki,k) while extracting G' from Gi 
depends on {ki-i,li-i). 

Example 1 (fc*, k, ki-i,k-i) = (0, 0, 0, 0) 

rrii, rii, G{_^_i and G' would be described as: rrii = 2rrii+i, rii = 2nj+i, 

{ m^+iP + n^+iQ, 'I ( niiP + n^Q, '| 

rrii+iP + {rii+i + 1)<5, > , G( = < rriiP + {rii + 1)<5, > 

{rrii+i + l)P + rii+iQ ) { {rrii + l)P + mQ ) 

Therefore, we can compute G' from G{_^_i as: 

m^P + UiQ = 2{mi+iP + rii+iQ) 

rriiP + {rii + 1)<5 = {rrii+iP + {rii+i + 1)Q) + {rrii+iP + rii+iQ) (5) 
{rrii + 1)P + mQ = {{rrii+i + l)P + m+iQ) + {rrii+iP + m+iQ) 

If we define G' = {To[t],Ti[z],T 2 [t]}, equations (5) can be described as: 

To[z] = 2To[z + l] 

Ti[t] = Ti[z + 1] + To[z + 1] (Ti[z + 1] - To[z + 1] = Q) 

T 2 [z] = T 2 [* + 1] + Toll + 1] {T^li + 1] - To[z + 1] = P). 
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Example 2 (fc*, k, ki-i,k-i) = (0, 1, 1, 0) 

rrii, rii, and G' would be described as: rrii = 2mi+i, rij = 2ni+i + 1, 

{ rrii+iP + Ui+iQ, \ ( niiP + niQ, 

n^i+iP + {ni+i + 1)Q, > , G^ = < {rrii + 1)P + riiQ, 

{iTT-i+l + 1)-P + (tti +1 + 1)Q J [ (tUi + l)P + {rii + 1)Q ^ 

Therefore, we can compute G' from as: 



niiP + mQ = {rrii+iP + (rii+i + 1)(5) + (m*+iP + Uj+iQ) 

{rrii + 1)-P + riiQ = {{rrii^i + 1)P + (tii+i + 1)Q) 

+ {'m-i+iP + n-i+iQ) (6) 

{rrii + 1)-P + {ni + 1)Q = ((tUi+i + 1)-P + (tij+i + 1)Q) 

+ {iTii+iP + {rii+i + 1)Q) 

Equations (6) can be described as: 

To\i] =T,[i+l]+ Toil + 1] {T,[i + 1] - To[z + 1] = Q) 

Ti[i] = T2 [z + 1] + To[i + 1] {T^li + 1] - To[z + 1] = P + Q) 

T2[z] = T2[t + 1] + Ti[i + 1] (T2[z + 1] - Ti[z + 1] = P) 

From equations (3) and (4), we can define G{_^_i as an initial set of G' as: 

Gt+i = {0,Q,P,P + Q} 

g;+i = Gt+i - {(1 - kt)P + (1 - h)Q}, 

where O is the point at infinity. By calculating G' from G'_|_j^ repeatedly, we can 
compute G[ from G{^i whereas kP + lQ will be computed from G^. Our method 
to compute x-coordinate of kP + IQ can be described as next page. Ti + Tj {P) 
means that the difference between Ti and Tj is P. 

At stepl, P — Q must be computed because the difference between {rm + 
1)P + riiQ and rriiP + {rii + G)Q is P — Q. 

We consider about the computational cost of the proposed method. At stepl, 
we compute P + Q, P — Q in affine coordinates and their computation cost is 
4M + 25' + /. At step2, 3 and 4, we use projective coordinates in the same 
way as Section 2.1. We assume |A:| = |l| as referred in the previous section. 
At step3, we require the addition formulas twice and the doubling formulas 
once, or the addition formulas three times per bit of fc. In either case, since the 
computational cost per bit of k is 9M + 65, the computational cost of step3 is 
9(|fc| — l)M + 6(|fc| — 1)5. The computational cost of step4 is 3M + 25 and that 
of step5 is M + /. Therefore, the computational cost of the proposed method is 
(9|A:|-l)M + (6|A:|-2)5 + 2/. 
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Algorithm 2: Montgomery Simultaneous Scalar Multiplication 

Input: k= {kf- kiko) 2 , 1 = {If ■ hlo) 2 , P,Q & Em {h or k = 1). 

Output: cc-coordinate of W = kP + IQ. 

1. Compute F+Q, P — Q. 

2. If (ktJt) = (0,1) then: To Ti ^ Q, T 2 ^ P + Q; 
else if {kt, It) = (1, 0) then: Tq^O, Ti ^ P, T 2 ^ P + Q; 
else then: Tq ^ Q, Ti ^ P, T 2 ^ P + Q. 

3. For i from t downto 1 do 

3.1. If {h,k,ki-i,li-i) = (0,0, 0,0) then: 

T2^T2 + To (P), Ti ^ Ti + Tq (Q), Tq ^ 2Tq; 

3.2. else if {h, k, ki-i,k-i) = (0, 0, 0, 1) then: 

T 2 ^T 2 + Ti (P - Q), Ti ^ Ti + To (Q), To ^ 2 To; 

3.3. else if (fc^, k, ki-i,k-i) = (0, 0, 1, 0) then: 

T ^ Ti, Ti ^ T2 + To (P), To ^ 2 To, T2 ^ T2 + T (P - Q); 

3.4. else if {ki,k, ki-i,k-i) = (0, 0, 1, 1) then: 

T ^ Ti, Ti ^ T2 + To (P), To ^ T + To (Q), T2 ^ T2 + T (P - Q); 

3.5. else if (fc^, k, ki-i,k-i) = (0, 1, 0, 0) then: 

T2 ^ T2 + To (P + Q), To ^ Ti + To (Q), Ti ^ 2 Ti; 

3.6. else if {h, k, ki-i,k-i) = (0, 1, 0, 1) then: 

T2 ^ T2 + Ti (P), To ^ Ti + To (g), Ti ^ 2 Ti; 

3.7. else if {h, k, ki-i,k-i) = (0, 1, 1, 0) then: 

T^Ti, Ti^T2 + To (P + g), To^T + To (g), T2^T2 + T(P); 

3.8. else if {h, k, ki-i,k-i) = (0, 1, 1, 1) then: 

T^Ti, Ti^T2 + To (P + g), To^ 2 T, T2^T2 + T(P); 

3.9. else if (fc^, k, ki-i,k-i) = (1,0, 0, 0) then: 

T^Ti, Ti^T2 + To (P + g), To^T + To (P), T2 ^ 2 T; 

3.10. else if {ki,k, ki-i,k-i) = (1, 0, 0, 1) then: 

T^Ti, Ti^T2 + To (P + g), To^T + To (P), T2^T2 + T(g); 

3.11. else if {h, k, ki-i,k-i) = (1, 0, 1, 0) then: 

To ^ Ti + To (P), T2 ^ T2 + Ti (g), Ti ^ 2 Ti; 

3.12. else if {ki, k, ki-i,k-i) = (1,0, 1, 1) then: 

To ^ T2 + To (P + g), T2 ^ T2 + Ti (g), Ti ^ 2 Ti; 

3.13. else if {ki, k, ki-i,k-i) = (1, 1, 0, 0) then: 

T ^ Ti, Ti ^ T2 + To (P), To ^ T + To (P - Q), T2 ^ T2 + T (g); 

3.14. else if {ki, k, ki-i,k-i) = (1, 1, 0, 1) then: 

T ^ Ti, Ti ^ T2 + To (P), To ^ T + To (P - Q), T 2 ^ 2T2; 

3.15. else if {ki, k, ki-i,k-i) = (1, 1, 1, 0) then: 

To ^ Ti + To (P - g), Ti ^ T2 + Ti (g), T2 ^ 2T2; 

3.16. else then: 

To ^ T2 + To (P), Ti ^ T2 + Ti (g), T2 ^ 2T2. 

4. If {ko, lo) = (0, 0) then W ^ 2To; 

else if {ko, lo) = (0, 1) then 1+ ^ Ti + To {Q); 
else if {ko, lo) = (1, 0) then 1+ ^ Ti + To (P); 
else then W ^ Ti + To {P — Q). 

5. Compute x-coordinate of IF by x = X/Z. 
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4 Comparison 



Now we compare the computational cost of the proposed method to that of both 
methods in Section 2. In addition, we compare this to the computational cost 
of the method described in IEEE PI 363 Draft [8], which is based on the scalar 
multiplication using NAF on the elliptic curve with Weierstrass form. This is a 
fair comparison because all four methods require no precomputed point. Table 
1 shows the computational cost of each method to compute x-coordinate of 
kP + IQ. M, S and / respectively denote the computational costs of a field 
multiplication, squaring and inversion. 



Table 1. The computational costs of every method 



Method 


M 


S 


I 


Weierstrass NAF [8] 


1 

o 


14|fc| - 9 


1 


Montgomery 


12|fc| -h29 


8|fc| 


1 


Weierstrass Simultaneous -l-NAF (Algorithm 1) 


(76|fe| -h45)/9 


(61|fc| -h27)/9 


2 


Montgomery Simultaneous (Algorithm 2) 


9|fc| - 1 


6|fc| - 2 


2 



Table 2 shows the computational cost of each method for |fc| = 160 and 
the total cost when we assume S/M = 0.8 and I/M = 30 [12]. We also com- 
pare the computational cost of our method to that of Weierstrass simultane- 
ous scalar multiplication using window method and mixed coordinates, which 
requires memories for 13 points [3]. The proposed method, Montgomery simul- 
taneous scalar multiplication, is about 45% faster than the method described in 
IEEE P1363 Draft, and about 25% faster than the method using Montgomery 
scalar multiplication and the recovery of E-coordinate. Moreover, the proposed 
method is about 1% faster than Weierstrass simultaneous scalar multiplication 
using NAF. Our method is about 2% slower than Weierstrass simultaneous scalar 
multiplication using window method, but requires much less memories. 



5 Running Times 



Here we present the running times of each method described in Section 4. To 
calculate arbitrary precision arithmetic over IFp, we used the GNU MP library 
GMP [6]. The running times were obtained on a Pentium II 300 MHz machine. 
We used the following elliptic curve over Fp, where jpj = 162 and the order of 
the base point r was 160 bits. //E means the order of this elliptic curve. 
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Table 2. The computational cost of each method for |fc| = 160 and S/M — 0.8, 1 /M = 
30. 



Method 


M 


S' 


I 


M {S/M = 0.8, 1 /M = 30) 


Weierstrass NAF 


2133 


2231 


1 


3938 


Montgomery 


1949 


1280 


1 


3003 


Weierstrass Simultaneous -|- NAF 


1356 


1087 


2 


2286 


Montgomery Simultaneous 


1439 


958 


2 


2265 


Weierstrass Simultaneous -I- Window 


1281 


1018 


4 


2215 



P = 


2 


0aa6f c4d 


8396f 3ac 


a = 


1 


5fed3282 


429907d6 


b = 


1 


74019686 


9a423134 


*E = 


4 * 


82a9bfl3 


60e5bceb 


A = 


1 


8be6a098 


c28d6bc0 


B = 


0 


120c2550 


f6ff7a01 



06200db7 3e819694 067a0e7b 
03b41b7a 309abf87 bed9bd83 
f3cdf013 bl3564d0 ba3999e8 
01878167 ld478cea 881eldld 
3286dc51 e7e3f705 8a5b9d98 
440d78dl 122fa3ac aa70fd53 



We obtained average running times to compute a;-coordinate of kP + IQ 
by randomly choosing 100 points P, Q over this elliptic curve and 100 integers 
k,l < r. Table 3 shows the average time of each method to compute x-coordinate 
of kP + IQ. From Table 3, we notice the proposed method, Montgomery simul- 
taneous scalar multiplication, is about 44% faster than the method described in 
IEEE P1363 Draft, and about 25% faster than the method using Montgomery 
scalar multiplication and the recovery of E-coordinate. Moreover, the proposed 
method is about 3% faster than Weierstrass simultaneous scalar multiplication 
using NAF. This shows that the theoretical advantage of our method is actually 
observed. 



Table 3. The average time of each method 



Method 


Average time (ms) 


Weierstrass NAF 


36.2 


Montgomery 


26.9 


Weierstrass Simultaneous -|- NAF 


20.8 


Montgomery Simultaneous 


20.1 
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6 Elliptic Curve over IF 2 « 

A non-supersingular elliptic curve E over IF 2 « is represented hy E \ + xy = 

x^+ax^+b, where a, 5 G F 2 » , 5 7 ^ 0. We can apply the proposed method to Mont- 
gomery method on elliptic curves over F 2 « [14]. The advantage of Montgomery 
method on elliptic curves over F 2 ~ is that we need not consider transformability 
from Weierstrass form to Montgomery form. Since the computational cost of a 
field squaring over F 2 »» is much lower than that of a field multiplication over 
F 2 *», we can ignore it. 

In Montgomery method, the computational cost of addition formulas and 
doubling formulas are respectively 4M and 2M in projective coordinates. If we 
compute kP + IQ using Montgomery scalar multiplication, we requires addition 
formulas twice and doubling formulas twice per bit of k. Therefore, the compu- 
tational cost per bit of k is estimated to be about 12 M. 

On the other hand, if we compute kP + IQ using Montgomery simultaneous 
scalar multiplication, we require addition formulas twice and doubling formulas 
once at probability of 3/4, and addition formulas three times at probability of 
1/4 per bit of k, as described in Algorithm 2. Since we require addition formulas 
9/4 times and doubling formulas 3/4 times per bit of k, the computational cost 
per bit of k is estimated to be about 21/2 • M. 

In Weierstrass simultaneous scalar multiplication over F 2 ™ using NAF [7,13], 
the computational cost per bit of k is estimated to be about 9M if a = 0, 1. 

Therefore, the proposed method is only 13% faster than the method using 
Montgomery scalar multiplication and 17% slower than Weierstrass simultaneous 
scalar multiplication using NAF. This shows that the proposed method on elliptic 
curves over F 2 »i is not so efficient as that on elliptic curves with Montgomery 
form over Fp. 



7 Conclusion 

We proposed the new method to compute cc-coordinate of kP + IQ simultane- 
ously on the elliptic curve with Montgomery form over Fp without precomputed 
points. To compute x-coordinate of kP + lQ is required in ECDSA signature ver- 
ification. Our method is about 25% faster than the method using Montgomery 
scalar multiplication and the recovery of F-coordinate over Fp, and slightly 
faster than Weierstrass simultaneous scalar multiplication over Fp using NAF 
and mixed coordinates. Our method is considered to be particularly useful in 
case that ECDSA signature generation is performed using Montgomery scalar 
multiplication on the elliptic curve over Fp because of its efficiency of computa- 
tion and its immunity to timing attacks, since all arithmetic on the elliptic curve 
can be computed with Montgomery form and we don’t require transformation 
to the elliptic curve with Weierstrass form. Furthermore, we showed that our 
method was applicable to Montgomery method on elliptic curves over F 2 »i. 
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Abstract. We discuss multidoubling methods for efficient elliptic scalar 
multiplication. The methods allows computation of 2^ P directly from P 
without computing the intermediate points, where P denotes a randomly 
selected point on an elliptic curve. We introduce algorithms for elliptic 
curves with Montgomery form and Weierstrass form dehned over finite 
fields with characteristic greater than 3 in terms of affine coordinates. 
These algorithms are faster than k repeated doublings. Moreover, we 
apply the algorithms to scalar multiplication on elliptic curves and ana- 
lyze computational complexity. As a result of our implementation with 
respect to the Montgomery and Weierstrass forms in terms of affine coor- 
dinates, we achieved running time reduced by 28% and 31%, respectively, 
in the scalar multiplication of an elliptic curve of size 160-bit over finite 
fields with characteristic greater than 3. 

Keywords. Elliptic curve cryptosystems. Scalar multiplication, Mont- 
gomery form. Multidoubling, Fast implementation 



1 Introduction 

Elliptic curve cryptosystems, which were suggested by Miller [Mi85] and Koblitz 
[Ko87], are now widely used in various security services. IEEE and other stan- 
dardizing bodies such as ANSI and ISO are in the process of standardizing el- 
liptic curve cryptosystems. Therefore, it is very attractive to provide algorithms 
that allow efficient implementation. Encryption/decryption or signature genera- 
tion/verification schemes require computation of scalar multiplication. The com- 
putational performance of cryptographic protocols with elliptic curves strongly 
depends on the efficiency of the scalar multiplication. Thus, fast scalar multipli- 
cation is essential for elliptic curve cryptosystems. 

One method to increase doubling speed involves the ^^multidoubling” , which 
computes 2^P directly from P € E(Fg), without computing the intermediate 
points 2P, 2^P, • • • , 2^~^P. The concept of multidoubling was first suggested by 
Guajardo and Paar in [GP97]. They formulated algorithms for the multidoubling 
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of 4P, 8P and 16P on elliptic curves over F 2 *» in terms of affine coordinates. Re- 
cent related results include a formula for computing 4P on elliptic curves over Fp 
in affine coordinates by Muller [Mu97] and a formula for computing AP on elliptic 
curves over Fp in projective coordinates by Miyaji, Ono and Cohen [MOC97a]. 
These formulae are more efficient than repeated doublings. All of the previous 
works were on the subject of elliptic curves with Weierstrass form. Another 
model of an elliptic curve that is useful for cryptosystems is the Montgomery 
form. Montgomery introduced the equation to speed up integer factorization 
with elliptic curves [Mo87] . The elliptic curve method of factoring was proposed 
by H.W.Lenstra [Le87]. In recent years, several authors have proposed elliptic 
curve cryptosystems using the Montgomery model [Iz99,LD99,OKSOO]. 

In this paper, we propose efficient algorithms for speeding up elliptic curve 
cryptosystems with Montgomery elliptic curves in terms of affine coordinates. 
We construct efficient formulae that compute 2^P directly for Vfc > 2. In the 
case of an elliptic curve with Montgomery form, our formulae have computational 
complexity {8k + A)M + {Ak — l)S +X, where M, S, andX denote multiplication, 
squaring and inversion in Fp, respectively. This is more efficient than k repeated 
doublings, which require k inversions. When implementing our multidoubling 
method, experimental results show that computing 16P achieved running time 
reduced by 40% over 4 doublings in affine coordinates. Moreover we introduce 
formulae that compute 2*P directly for Vfc > 2, for Weierstrass elliptic curves in 
terms of affine coordinates. Our formulae have computational complexity (4fc -|- 
1)A4 -I- (4fc -I- 1)5 H-T. The formulae have slightly simple form compared to the 
formulae described in [SSOl] and have computational advantage, due to one field 
multiplication, over the formulae proposed in [SSOO] 

As a results of our implementation with respect to Montgomery and Weier- 
strass forms in terms of affine coordinates, we achieved running time reduced by 
28% and 31%, respectively, in the scalar multiplication of an elliptic curve of size 
160-bit. We also discuss the computational complexity of scalar multiplication 
using multidoubling. The proposed algorithm improve the performance of scalar 
multiplication with the binary method, as well as the window method. There- 
fore, they are effective in restricted environments where resources are limited, 
such as smart cards. 

2 Previous Work 

In this section, we summarize the multidoubling, the direct computation and 
arithmetic for an elliptic curve with Montgomery form. 

2.1 Multidoubling and Direct Computation 

The concept of using multidoubling and direct computation of 2^P to efficiently 
implement elliptic scalar multiplication was first proposed by Guajardo and Paar 
in [GP97] . They formulated algorithms for computing 4P, 8P, and 16P on elliptic 
curves over F 2 « in terms of affine coordinates. In recent years, several authors 




270 Yasuyuki Sakai and Kouichi Sakurai 



have reported methods that compute 2^P directly, but some of them are limited 
to small k. The following section summarizes the previous work on multidoubling 
and direct computation. 

1. Guajardo and Paar [GP97] proposed formulae for computing 4P, 8P, and 
16P on elliptic curves over ¥ 2 ^ in terms of affine coordinates. 

2. Muller [Mu97] proposed formulae for computing 4P on elliptic curves over 
Fp in terms of affine coordinates. 

3. Miyaji, Ono, and Gohen [MOG97a] proposed formulae for computing 4P on 
elliptic curves over Fp in terms of projective coordinates. 

4. Han and Tan [HT99] proposed formulae for computing 3P, 5P, 6P, 7P, etc, 
on elliptic curves over F 2 " in terms of affine coordinates. 

5. Sakai and Sakurai [SS00,SS01] proposed formulae for computing 2^P (Vfc > 
1) on elliptic curves over Fp in terms of affine coordinates. 

We should remark that the algorithm proposed by Gohen, Miyaji and Ono in 
[GM098] can be efficiently used for the direct computation of several doublings. 
The authors call their algorithm a “modified jacohiari’’ coordinate system. The 
coordinate system uses (redundant) mixed representation such as {X, Y, Z, aZ^). 
Doubling in terms of the modified jacobian coordinates has computational ad- 
vantages over weighted projective (jacobian) coordinates. Itoh et al. also gave a 
similar method for doubling [ITTTK99]. 

All of these works dealt with computations on Weierstrass elliptic curves. In 
later sections, we will formulate algorithms that work on Montgomery elliptic 
curves in terms of affine coordinates, and analyze their computational complex- 
ity. 

2.2 Elliptic Curves with Montgomery Model 

Let a,b € Fp, 4a^ -|- 276^ yf 0, p > 3, and p be a prime number. An elliptic curve 
defined over Fp for Weierstrass model is defined by the following equation (1). 
Elliptic curve cryptosystems using curves with Weierstrass form are in the pro- 
cess of being standardized, e.g., [IEEE], and are widely used in various security 
services. 



E : + ax + b (1) 

H. W. Lenstra proposed the elliptic curve method of factoring [Le87]. Mont- 
gomery introduced the following equation to speed up integer factorization with 
elliptic curves [Mo87]. In recent years, several authors have proposed cryptosys- 
tems using an elliptic curve with Montgomery form [Iz99,LD99,OKSOO]. Let 
A, B € Fp, — A) B yf 0. An elliptic curve of Montgomery model is defined by 
the following equation (2). 

Em '■ Bv^ = + Avf' + u (2) 

The formulae for transforming Montgomery and Weierstrass forms are given 
by the following (See [Iz99] for details). 
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By the transformation u = and v = we obtain + 

^ . Therefore, by the relationship a = ^ ^ and b = 

transform a Montgomery form into a Weierstrass form. 

The above linear transformation clearly converts any elliptic curve with 
Montgomery form into a curve with Weierstrass form. However, the inverse trans- 
formation, from Weierstrass form to Montgomery form, works only if there exists 
a particular curve. Based on the above relationship between (a, b) and {A, B), we 
eliminate B, then we obtain — 27(r — 1)A^ + 27(4r — 1) = 0, where r = 

Let we consider the equation /(t) = t® — 9t^ — 27(r — l)t-|-27(4r— 1) = 0, 
where t = A^. If f{t) has a solution t = a such as a quadratic residue in Fp, 
a Weierstrass form can be transform into a Montgomery form by the following 
relation. Let /3 be a square root of a, we obtain A = j3 and B = ^^^ 2 ^ 2 ■ Then 
the relation x = ^ derived. 

A detailed analysis on the case which can transform a Weierstrass form into 
a Montgomery form was given by Izu [Iz99]. He concluded that approximately 
40% of curves with Weierstrass form can be transformed into a curve with Mont- 
gomery form. 

2.3 Group Operation for Elliptic Curves with Montgomery Form 

We describe algorithms for group operation in an elliptic curve with Montgomery 
form. When we estimate a computational efficiency, we will ignore the cost of a 
field addition, as well as the cost of a multiplication by small constants. 



AfRne Coordinates. Suppose P 3 (m 3 ,U 3 ) = Pi{ui,vi) + P 2 {u 2 ,V 2 ) € E^i^p), 
and Pi yf P 2 . The addition formulae are given by the following. 

U 3 = BX^ — A — u\ — U 2 

V3 = \{ui - U3) - Vi /O'! 



Ui - U2 

The computational complexity for an addition involves 3A4 -f 5 -I- X. 

Suppose P 3 {u 3 ,V 3 ) = 2Pi{ui,vi) G Em(^p)- Point doubling can be accom- 
plished by the following. 



M 3 = BX^ — A — 2u\ 

V3 = X(ui - U3) - Ml 

3mi 2Ami 1 

The computational complexity of a doubling involves 5A4 -I- 25 -I- X. 



(4) 



Projective Coordinates. Next, we describe formulae for the group opera- 
tions in projective coordinates. Let u = ^, v = ^. Suppose P 2 {U 2 ,W 2 ) = 
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Pi{Ui,Wi) + P{u,v,l). The point PsiU^^W^) = Pi{Ux,Wx) + P^iU^^W^) can 
be computed by the following. 

[/S = {U1U2 - WlW2f 

= {SubiAdd2 + AddiSub2f 
W3 =u{UlW2 - WlU2f 

= u {SubiAdd2 — AddiSub2)^ 

where, Addi = Ui + W\, Sub\ = Ui — Wi, Add 2 = U 2 + W 2 and Sub 2 = U 2 — W 2 - 
The computational complexity for an addition involves 3Af + 25. 

Note that t^-coordinate does not enter into any of the formulae. An addition 
can be accomplished without computation of the V^-coordinate if the difference 
between the two given points is known [Mo87]. Point doubling can be accom- 
plished by the following. 

U3 = (C/i -Wif ^ AddiSubi 
IV3 = 4UiWi (t/i -t AUiWi + TTi ) 

= (Add! - Sub!) (Sub! + C (Add! - Subl) ) 

where, C = For a given curve, C can be pre-computed. Therefore above 
formulae have computational complexity 3Af -I- 25. 

The basic well known method for elliptic scalar multiplication on curves with 
Weierstrass form is the “double- and- add^ (or binary) method. There are several 
methods which have computational advantage over the binary method such as 
the window method. However, in the case of elliptic curves with Montgomery 
form in terms of projective coordinates, we can not apply such methods, because 
the difference between the two given points, i.e., the [/-coordinate of P 2 — Pi, 
must be known when adding the two points. To compute kP, we compute 2P and 
then repeatedly compute two points (2mP, (2m-\-l)P) or ((2m-\-l)P, (2m-\-2)P), 
depending on whether the corresponding bit in the binary representation of /c is a 
0 or a 1 [AMV93,Mo87,MV93]. This method maintains the invariant relationship 
such that the difference of the two points always P. 

3 The Proposed Algorithms 

In this section, we describe new algorithms for elliptic curves with Montgomery 
form, which compute 2^P directly from a given point P G Pm(^p) without 
computing the intermediate points 2P, 2^P, • • • , 2^~^P. We will begin by con- 
structing formulae for small k, then we will construct an algorithm for general 
k (k > 2). We will also show an algorithm that compute 2^P directly for ellip- 
tic curves with Weierstrass form. This is an improved version of the algorithm 
proposed in [SS00,SS01]. 

3.1 Montgomery Form 

As an example, we give an algorithm that compute 8P directly from P G Em (Fp) . 
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Computing 8P. Let Ps{us,vs) = 8P{ui,vi) G Pm(Fp). For an elliptic curve 
with Montgomery form in terms of affine coordinates, Pg be computed in 
the following way. The derivation is based on repeated substitution of the point 
doubling formulae, such that only one field inversion needs to be calculated. First 
we compute C, Di, E^, F^, for 1 < t < 3 as follows. 

C = AB 
D\ = uiB 

Pi = Vl 

Pi = 1 + ui{2A + 3mi) 

D2 = -2^CEI - 8 PiP? + Pi 
P 2 = -SB^Et + Pi(4PiP? - Di) 

F2 = 2‘^B^Et + D2{2^CEI + 3 P 2 ) 

P3 = -2^C(PiP2)^ - 8P2PI + Ei 
Es = - 8 P 2 + P2(4P2P2 - P 3 ) 

P 3 = 2 ®P^(PiP 2 )^ + P3(2®C(PiP2)^ + 3 P 3 ) 

Then we compute tig and wg as follows. 

-8Ei{2^C{EiE2f + P3) + P3 
26P(PiP2P3)2 

P3((2®C(PiP2)" + 12P3)P| - Ei) - 8 P| 

2^B^{EiE2E3)^ 

Note that C and can be pre-computed, and that although the denominator 
of Mg differs from that of wg, the above formulae require only one inversion 
if we multiply the numerator of wg by 2^ BE1E2E3. The above formulae have 
computational complexity 25A4 + 1 15 + X. 

Multidoubling. From the formulae that compute 2^P for small k, given in 
the previous subsection, we can easily obtain general formulae that allow direct 
doubling P 1 — > 2^P for k > 2. The figure shown below describes the formulae, 
and their computational complexity is given as Theorem 1. 

Algorithm 1: Direct computation of 2^ P in affine coordinates on an elliptic curve 
with Montgomery form, where k > 2 and P G Pm(Fp). 

INPUT : Pi = (mi , wi ) G P™ (Fp) 

OUTPUT : Pjfc = 2*= Pi = (ujic , M 2 O ^ Pm(Fp) 

Pre Computations 

C = AB 
B 2 = 

Step 1 . Compute Do, Po and Po 

Do = uiB 
Po = Ml 

Po = 1 F Ml (2 A + 3mi) 
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Step 2. For i from 1 to fc compute Di and Ei, for i from 1 to fc — 1 compute Fi 



Di 









2 

— 8Di-iEf_i + Fi_i 



Vj=o 



if f = 1 -El = — 8-B2E0 + Fq[ 4 :EqEq — El) 



else Ei — — SEi_i + Ei_i( 4 -Di_iEi_i — Ei) 



Ei = 









+ 3-D| 



Step 3. Compute U 2 k and V 2 k 



U2k 



V2k 



Ek 

Ek 

b2 



Theorem 1. For an elliptic curve with Montgomery form in terms of affine 
coordinates, there exists an algorithm that computes 2^P, with k > 2, in at most 
8fc + 4 field multiplication, 4/c — 1 field squaring and one inversion in Fp for any 
point P G -Em(Fp), excluding precomputation. 

The proof is outlined in Appendix A.l. 

3.2 Weierstrass Form 

The multidoubling for Weierstrass elliptic curves in terms of affine coordinates 
is given below. Their computational complexity have, given as Theorem 2, {Ak + 
l)A4 + (4fc+l)5+X. The complexity has a one field multiplication computational 
advantage over the formulae proposed in [SSOO]. Moreover, the formulae have 
slightly simple form compared to the formulae described in [SSOl] 



Algorithm 2: Direct computation of 2^ P in affine coordinates on an elliptic curve 
with Weierstrass form, where k> 1 and P £ E(Fp). 



INPUT: Pi = {xi,yi) £ E{lp) 

OUTPUT: P2k = 2*= Pi = (x2k,y2k) € E{¥p) 

Step 1. Compute Aq, Bq and Co 

Ao = xi 
Bq — yi 
Co = 3xf + a 
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Step 2. For i from 1 to fc compute Ai and Bi, for i from 1 to fc — 1 compute Ci 

Ai = Ci_i — 8Ai—iBi_i 

Bi = 8Bi_i + Ci—i(Ai — 4Ai—iBi_i) 

Ci = 3A" + 



Step 3. Compute X 2 k and y 2 k 



X2k 



V2k 



Ak 

Bk 



Theorem 2. For an elliptic curve with Weierstrass form in terms of affine 
coordinates, there exists an algorithm that computes 2^P in at most 4fc + 1 
multiplications, 4fc + 1 squarings, and one inversion in Fp for any point P G 
i?(Fp). 

The proof is outlined in Appendix A. 2. 



3.3 Complexity Comparison on Direct Computation 

In this subsection, we compare the computational complexity of the multidou- 
bling given in the previous subsection with the complexity of k separate repeated 
doublings. The complexity of a doubling is estimated from the algorithm given 
by the formulae (4) or as shown in [IEEE]. Tables 1 and 2 show the number 
of multiplications A4, squarings S, and inversions X in the base field Fp. Note 
that our method reduces inversions at the cost of multiplications. Therefore, the 
performance of the new formulae depends on the cost factor of one inversion 
relative to one multiplication. For this purpose, we introduce the notation of a 
“break-even point ”, as used in [GP97]. It is possible to express the time that it 
takes to perform one inversion in terms of the equivalent number of multiplica- 
tions needed per inversion. In this comparison, we assume that one squaring has 
complexity S = 0.8A4, and that the costs of field addition and multiplication by 
small constants can be ignored. 

As we can see from Table 1, if a field inversion has complexity X > 10.4A4, 
one quadrupling will be more efficient than two separate doublings. If Fp has 
size 160-bit or larger, it is likely that X > 10.4A4 in many implementations (e.g., 
see [WMPW98]). In addition, if fc > 2, our direct computation method is more 
efficient than individual doublings in most implementations. For Weierstrass 
form, shown in Table 2, our direct computation method is more efficient than 
individual doublings in most implementations. 
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Table 1. Complexity comparison on direct computation : Montgomery form 



Calculation 


Method 


Complexity 


Break-Even 

Point 


M 


S 


T 


4P 


Direct computation 


18 


7 


1 


10.4A1 < X 


Separate 2 doublings 


10 


4 


2 


8P 


Direct computation 


25 


11 


1 


7.0M < X 


Separate 3 doublings 


15 


6 


3 


16P 


Direct computation 


32 


15 


1 


53M < X 


Separate 4 doublings 


20 


8 


4 


2'=P 


Direct computation 


8k + 4 


4k -1 


1 


(4.6+ ^)Af <P 


Separate k doublings 


5k 


2k 


k 



Table 2. Complexity comparison on direct computation : Weierstrass form 



Calculation 


Method 


Complexity 


Break-Even Point 


M 


S 


X 


4P 


Direct computation 


9 


9 


1 


8.&M < X 


Separate 2 doublings 


4 


4 


2 


8P 


Direct computation 


13 


13 


1 


6.3M < X 


Separate 3 doublings 


6 


6 


3 


16P 


Direct computation 


17 


17 


1 


5AM < X 


Separate 4 doublings 


8 


8 


4 


2&P 


Direct computation 


4fc + 1 


4k + 1 


1 


(3.6+ 


Separate k doublings 


2k 


2k 


k 



4 Scalar Multiplication with Direct Computation 

4.1 The Algorithm 

Using our previous formulae for direct computation of 2^P, we can improve 
elliptic scalar multiplication with the sliding signed binary window method 
[Go98,KT92]. For example, we apply our new formulae to the window method 
with windows of length 4. We represent a scalar m in P mP with a nonad- 
jacent form (NAF) For example, m = (1101110111)2 will be represented as 
m' = (lOOiOOOiOOi)ATAir, where 1 denotes -1. 

^ Koyama and Tsuruoka pointed out that an NAF is not necessarily the optimal 
representation to use [Go98,KT92]. Although it has minimal weight, allowing a few 
adjacent nonzeros may increase the length of zero-runs, which, in turn, would re- 
duces the total number of additions. Their method may be useful for our scalar 
multiplication with direct computations of 2^ P. 
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Algorithm 3 describes scalar multiplication on elliptic curves using our direct 
computations of 2^P for the case k up to 4. 



Algorithm 3: Elliptic scalar multiplication combining our direct computation 
of 2^P with the window method, and window size fc = 4 

INPUT: P G E,„(Fp) or E(Fp), TO G Z 

OUTPUT: mP G Em(Fp) or P(Fp) 

Step 1 . Construct NAF representation 

TO = {etct-i ■ • • eieo)NAF, G {-1, 0, 1} 

Step 2. Precomputation 

2.1 Pe^GP 

2 . 2 For i from 7 to 10 do: Pi ^ Pi-i + P 
Step 3. Pm^O, 

Step 4. While i > 3 do the following: 

4.1 If Ci = 0 then: 

find the longest bitstring CiCi-i • • • e/ such that = ej_i = • • • e; = 0, 
and do the following 

p ^ tyi—l+l p 

J m ^ ^ J m 

i ^ — 1 

4.2 else (e^ ^ 0): 

If {e^ei-iei- 2 ei- 3 )NAF > 0 then: 

Pm < 16Pm + P(eiei-iei-2ei-3)NAF 

else: 

Pm < 16Pm ~ P\(eiei-iei-2ei-3)NAF\ 

i ^ i — 4: 

Step 5 . Pm ^ (ci • • • eo)NAFPm using the traditional double- and- add method 
Step 6. Return Pm 



In Algorithm 3, we compute 16P directly from P in each window rather than 
using 4 separate doublings. In Step 4.1 with strings of zero-runs in the scalar 
ifinaf, we should choose computations 16P, 8P, 4P or 2P optimally. This can 
be done with rules such as: 1) If a length of zero equals to 4, we compute 16P. 
2) If a length of zero equals to 3, we compute 8P, and so on. Note that the 
computation for Step 5 is inexpensive if to is large. 

Using our algorithms for scalar multiplication, many of the doublings in the 
traditional window method will be replaced by the direct computation of 16P. 
Therefore, if one computation of 16P is relatively faster than four doublings, 
scalar multiplication with our method will be significantly improved. We will 
examine this improvement by real implementation in the next section. 

5 Complexity Comparison on Scalar Mnltiplication 

In this section, we discuss the computational complexity of scalar multiplication 
using our direct computation. 
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Table 3. Number of computations of 2^P, where 1 < fc < 4, and addition in the 
sliding signed binary window method with window length of 4 



Curves 


Add 


2P 


4P 


8P 


16P 


160-bit 


36.82 


14.93 


4.99 


2.58 


31.75 


192-bit 


37.63 


15.29 


5.06 


2.64 


39.54 


224-bit 


37.77 


15.31 


5.05 


2.69 


47.49 


256-bit 


41.77 


15.31 


5.06 


2.70 


55.53 


384-bit 


48.76 


17.13 


5.10 


3.59 


86.26 


521-bit 


90.39 


31.34 


14.51 


6.94 


109.87 



5.1 Number of 2^P Computations in the Window Method 

Table 3 shows the number of required computations of 2^P and additions in 
the sliding signed binary window method based on Algorithm 3. The window 
size shown in the table is 4 as an example. The numbers were counted by our 
implementation such that We randomly generated 10000 exponents and counted 
the number of operations. The averages of the numbers are shown in Table 3. In 
the case of a window of length 4, direct computations of 4P, 8P, and 16P can 
be used. 

From the table, we can see that with direct computations of up to 16P, the 
computational efficiency of 16P significantly affects scalar multiplication. 

5.2 Break-Even Point 

Based on the number of computations of 2^P in scalar multiplication, given in 
Table 3, we compared the computational complexity of scalar multiplication. For 
example, in the case of a 160-bit scalar, the complexity of scalar multiplication 
using direct computation with k = A can be evaluated as: C = 36.82A-|-14.93I?2+ 
4.99P4-|-2.58P8+31.75 Pi 6, where A, D 2 , P4, Ds, and Pie denote the complexity 
of the computation for point addition, doubling, 4P, 8P, and 16P, respectively. 
The complexity of those point operations can be evaluated using the algorithms 
given in the previous sections. For the proposed scalar multiplication with the 
window method, we used Algorithm 3, which is based on the sliding window 
method with NAF representation for a scalar. 

The complexity comparisons in the case of 160-bit are described in Tables 4 
and 5. By the “Traditional method”, we mean a scalar multiplication using the 
double- and- add method in terms of affine coordinates. Again, we assume that 
one squaring has complexity S = 0.8A4. For larger sizes, the comparison can be 
obtained in the same way. 
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Table 4. Break-even point in scalar multiplication on a 160-bit elliptic curve with 
Montgomery form 





Method 


Complexity 


Break-Even Point 


Binary 


Traditional 


1360 At + 240T 


9.3A1 < I 


Proposed 


1686M + 205T 


NAF 


Traditional 


1259A1 + 213T 


6.6M < X 


Proposed 


1551A1 + 169T 


Window with NAF 


Traditional 


1195M + 197T 


6.1A1 <T 


Proposed 


1840A1 -h 91T 



Table 5. Break-even point in scalar multiplication on a 160-bit elliptic curve with 
Weierstrass form 





Method 


Complexity 


Break-Even Point 


Binary 


Traditional 


800A1 -h 240T 


6.6A1 < X 


Proposed 


1030 At + 205T 


NAF 


Traditional 


724 At -h 213T 


1AM <X 


Proposed 


1038 At + 169T 


Window with NAF 


Traditional 


679A1 -h 197T 


5.6M < X 


Proposed 


1269A1 -h 91T 



6 Running Time 

In this section, we present the running times that we obtained with our software 
implementation of the proposed algorithms. 

The platform consisted of a 600MHz Pentium III, which has 32-bit word, 
using Windows 2000, Visual C-I-+ 6.0, and MASM 6.15. The programs were 
written in assembly language for multi-precision integer operations, which may 
be time-critical in our implementation, or in ANSI C language for other opera- 
tions. 

We used the following domain parameters for an elliptic curve with Mont- 
gomery form. 

p = 800000000000000000000000000000000000012b 
A = 49cb474dl72aadfd987191a490ae0671674fe5a9 
B = 17240aee6elc8c00a7ecldflb8721d3f90437803 
Gu = 31c0186c5389eclc81d85f4el449390c954f7f39 
Gv = 534a718a33d4e2c2089ac68e48c8f6ebl01ec46d 
#A^(Fp) = 800000000000000000005b4c33272e33dfe2cb9c 

where {Gu,Gy) € and jJifm(Fp) denotes the number of points on Em- 
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Table 6. Running time of elliptic curve and field operations in msec 



Curve 


Elliptic (160-bit) 


Field (160-bit) 


Add 


2P 


4P 


8P 


16P 


multiply 


square 


inversion 


Montgomery 


0.11 


0.12 


0.19 


0.25 


0.29 


1.92 • 10"® 


1.63- 10“® 


56.0 • 10"® 


Weierstrass 


0.093 


0.094 


0.13 


0.16 


0.20 



Table 7. Running time of scalar multiplication of a randomly selected point in msec 



Curve 

(160-bit) 


Binary 


NAF 


Window with NAF 


Traditional 


Proposed 


Traditional 


Proposed 


Traditional 


Proposed 


Montgomery 


27.4 


25.7 


25.0 


22.7 


23.3 


16.7 


Weierstrass 


22.5 


20.2 


20.0 


17.1 


17.9 


12.3 



Table 8. Improvement of the performance of scalar multiplications in % 



Curve (160-bit) 


Binary 


NAF 


Window with NAF 


Montgomery 


6 


9 


28 


Weierstrass 


10 


14 


31 



Table 6 shows the running times of elliptic curve and definition field opera- 
tions. ^ Table 7 shows the running times of scalar multiplications. 

We achieved running time reduction as shown in Table 8. As a result of 
our implementation with respect to Montgomery and Weierstrass form in terms 
of affine coordinates, we achieved running time reduced by 28% and 31%, re- 
spectively, in the scalar multiplication of the elliptic curve of size 160-bit. The 
proposed algorithms improved the performance of a scalar multiplication with 
the binary method, as well as the window method. Therefore they are effective 
in an restricted environment where resources are limited, such as with a smart 
card. 
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A Computational Complexity of Direct Computations 



In this appendix, we give proofs of Theorems 1 and 2. In these proofs, we ignore the 
cost of field additions and a subtractions, as well as the cost of multiplications by small 
constants. 



A.l Proof of Theorem 1 

In Step 1 of Algorithm 1, two multiplications are performed to compute uiB and 
ui(2A + 3mi). The complexity of Step 1 involves 2A4. 

In Step 2, the following computations are performed k times to compute Di and 
Ei, and k — I times to compute Fi. We first perform 2 squarings to compute Ei_i 
and FI_i. If i > 1, we perform one multiplication to compute (n}=o^j)^- Next we 
perform 2 multiplications for the computation of Di-iEf_i and Ej)^ . Note 

that (n;: should be stored in the previous loop of the iteration. This gives Di, 

and so the complexity of computing Di involves 3Ad + 25 if f > 1 and 2M + 2S if 
i = 1. Next, we perform one squaring and one multiplication to compute Ef_i and 
Fi-i{Di-iEi_i — Di-i). If i = 1, we perform one more multiplication to compute 
B 2 EQ. This gives Ei, and so the complexity of computing Ei involves Af + 5 if i > 1 
and 2A4 + 5 if i = 1. Next, \i i ^ k, we perform 3 multiplications and one squaring to 
compute CD,, B 2 {YY~JoEjf, {Y\i~JoEjf{2‘^^B2{Y\]-JoEjf + 2^^+^CDi) and Dl This 
gives Fi, and so the complexity of computing Fi involves 3A1 + S. 

The total complexity of Step 2 involves fc(4Af + 35) + (fc — l)(3Af + 5). 

In Step 3, we hrst perform k — 1 multiplications to compute (]~[i=o^ Ei) and set the 
result to Ti. Next, two multiplications for Ei)^ and f?2(niLo^ Ei)'^ are performed. 

Note that (nt -Q Ei)^ has already been computed in Step 2. Then, we perform on 
inversion to compute (2®*^B2(nj=o Ej)^)~^ and set the result to T 2 - Next, we perform 
one multiplication to compute EkT 2 - Then, we obtain Ujfe . The complexity of computing 
Wjfc involves (k — 1)A4 + 3A4 + T. 

To compute U 2 h, we perform 3 multiplications to compute DkB, DkBTi, and 
DkBTiT 2 - Then, we obtain The complexity of computing involves 3A4. 

According to above computation, the complexity of Algorithm 1 involves {8k + 
4)M + (4fc - 1)5 + T. 

A. 2 Proof of Theorem 2 

In Step 1 of Algorithm 2, one squaring is performed to compute Xi. The complexity 
of Step 1 involves 5. 

In Step 2, the following computations are performed k times to compute Ai and 
Bi, and k—1 times to compute Ci. First, we perform 3 squarings to compute Bi_i, 
Bf_i, and Cf_i. Second, we perform one multiplication to compute Ai-iBf_i. Then 
we obtain Ai. Third, we perform one multiplication to compute C'i_i(Ai — 4Ai_i43f_i). 
Then we obtain Bi. Next, we perform one squaring to compute Af . If i = 1, we perform 
one multiplication to compute aBf and set the result to U, and if i > 1, we perform one 
multiplication to compute UBf_i and set the result to U. Then, U equals a(n}=i Ej)^. 
Then we obtain Ci. The complexity of Step 2 involves {2M + 35)fc + {A4 + 5)(fc — 1). 

In Step 3, we hrst compute nf=i Ei which takes fc — 1 multiplications. Second, we 
perform one inversion to compute (2*^ ]~[^=i ^0”^ S'lid the result to T. Next, we 
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perform one squaring to compute . Next, we perform one multiplication to compute 
AkT^. Then, we obtain X 2 k. Finally, we perform 2 multiplications to compute BkT^T. 
Then, we obtain y 2 k. The complexity of Step 3 involves {k — 1)A4 + 3A4 + S + T. 

According to above computation, the complexity of Algorithm 2 involves (4fc + 
l)Af + (4fc + l)5+T. 
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Abstract. This paper will propose an efficient algorithm that utilizes 
the signed-digit representation to compute the fcth term of a character- 
istic sequence generated by a linear feedback shift register of order 3 
over GF{q). We will also propose an efficient algorithm to compute the 
{h — dk)th term of the characteristic sequence based on the knowledge of 
the fcth term where fc is unknown. Incorporating these results, we con- 
struct the ElGamal-like digital signature algorithm for the public-key 
cryptography based on the 3rd-order characteristic sequences which was 
proposed by Gong and Harn in 1999. 

Key words. Public-key cryptosystem, digital signature, third-order 
linear feedback shift register sequences over finite fields. 



1 Introduction 

Gong and Harn have published papers on applying third-order linear feedback 
shift register (LFSR) sequences with some initial states to construct public-key 
cryptosystems (PKG) in the GhinaGrypt’98 [4] and in the IEEE Transactions on 
Information Theory [5], respectively. This type of LFSR sequences is called the 
characteristic sequence in the area of sequence study. The security of the PKG 
is based on the difficulty of solving the discrete logarithm (DL) in GF(q^); but 
all computations involved in the system are still performed in GF{q). 

In [5] , Gong and Harn have proposed the Diffie-Hellman (DH) key agreement 
protocol [1] and the RSA-like encryption scheme [14] as examples of the applica- 
tions of the GH public-key cryptosystem. Along this line, Lenstra and Verheul 
[7] have published their XTR public-key system at the Grypto’2000. In the XTR 
public-key system, they have also used the 3rd-order characteristic sequence; 
but with a special polynomial. They have proposed the XTR DH and the XTR 
Nyberg-Rueppel signature scheme as examples. 

In this paper, we will review some fundamental properties of 3rd-order char- 
acteristic sequences and the original GH public-key cryptosystem and point it 
out that the XTR cryptosystem is constructed based on a special type of 3rd- 
order characteristic sequences as Gong and Harn have analyzed in [5]. Then, 
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we will explore some useful properties of 3rd-order characteristic-sequences over 
GF{q) and utilize these results in the construction of the GH ElGamal-like dig- 
ital signature algorithm [2] [13]. 

The paper is organized as follows. In section 2, we introduce LFSR sequences 
and 3rd-order characteristic sequences over GF{q). Then, we will review the 
original GH public-key cryptosystem and explain the relations of design ap- 
proach between GH and XTR public- key cryptosystems. In Section 3, using the 
maximal-weight signed-digit representation, we propose a fast algorithm for eval- 
uating the kth. term of a pair of reciprocal characteristic sequences over GF{q) 
and discuss the computational complexity for a special case when q = p^. This 
algorithm is more efficient than the previously proposed one [5] . In Section 4, we 
will introduce the Duality Law of a pair of reciprocal characteristic sequences. 
Using this law, we show the property of redundancy in states of characteristic 
sequences over GF{q). Lenstra and Verheul in [8] have also found this type of 
redundancy for a special case of characteristic sequences over GF{p^). However 
the technique used in [8] can not be extended to the general case of the charac- 
teristic sequences over either GF{p^) or GF{q) for any arbitrary q. In Section 
5, using the linear feedback shift register concept, we will propose an efficient 
algorithm to compute the {h — dk)th term of a characteristic sequence based on 
the knowledge of the /cth term where k is unknown to the user. This mentioned 
property is required for digital signature verification. This algorithm can save the 
matrix computation needed in Algorithm 2.4.8 proposed by Lenstra and Verheul 
in [7]. Then we will apply these results to the design of the GH ElGamal-like 
digital signature algorithm as an example of digital signature schemes for the 
GH public-key cryptosystem. 

2 Characteristic Sequences 

and the GH Public-Key Cryptosystem 

In this section, we will briefly introduce LFSR sequences, characteristic se- 
quences, and the GH Public-key Gryptosystem and explain that the sequences 
used to construct the XTR cryptosystem is just a special case used in the design 
of the GH cryptosystem. We will use the notation K = GF{q) where q = p'', p 
is a prime and r is a positive integer throughout this paper. 

2.1 LFSR Sequences 

Let 

f{x) = x" - Cn-lX^~^ CiX - Co,Ci G K 

be a polynomial and 

S "^25 ' ' ' : ^ 

be a sequence over K. If s satisfies the following linear recursive relation 

n—1 

^k-\-n ~ ^ ^ A; = 0, 1, * * * . 

2 = 0 




286 Guang Gong, Lein Harn, and Huapeng Wu 



then we say that s is an LFSR sequence of order n (generated by f{x)). {sq, si, 

• • • , s„_i) is called an initial state of the sequence s or f{x). A vector {sk, Sfc+i, 

• • • , Sfe+„_i) containing consecutive n terms of s is called a state of s, or the fcth 
state of s, which is denoted by s^,. 

Example 1. Let K = GF{5),n = 3 and f{x) = x^ — x— 1 which is an irreducible 
polynomial over K. An LFSR sequence generated by f{x) is given below: 

3033201244 

3013434143 

2111001041 

1 

which has period 31 = 5^ + 5 + 1 and the initial state is Sg = (3, 0, 3). 



2.2 Irreducible Case and Trace Representation 

If f{x) is an irreducible polynomial over K, let a be a root of f{x) in the 
extension E = GF{q^), then there exists some (3 G K such that 

Si = Tr{(3a'),i = 0, 1, 2, • • • , 

where Tr{x) = x + x'^ + ■ ■ ■ + x‘^ is the trace function from E to K.li (3 = 1, 
then s is called a characteristic sequence of f{x), or a char-sequence for short. 
The sequence given in Example 1 is the characteristic sequence of f{x). 



2.3 Period and Order 

Let f{x) G K[x\, we say that f{x) has period t if t is the smallest integer such 
that f{x)\x* — 1. We denote it as per{f) = t. 

For (3 G E, the order of (3 is the smallest integer t such that 

/ 3 ‘ = 1 . 

We denote it as ord{f3) = r. A proof of the following result can be found in 
several references on sequences, for example, in [9,10]. 

Lemma 1. If f{x) G K[x] is irreducible over K and s is generated by f{x), 
then 

per(s) = per{f) = ord{a) 

where a is a root of f{x) in the extension GF{q'^). 

2.4 Third-Order Characteristic Sequences 

Let 

f{x) = x^ — ax^ bx — l,a,b G K (1) 
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be an irreducible polynomial over K and a be a root of f{x) in the extension 
field GF(q^). Let s = {si} be a characteristic sequence generated by f(x). Then 
the initial state of {s^} can be given by 

So = 3, Si = a, and S 2 = — 2b. 

Example 1 is a characteristic sequence of oder 3. 

We list the following lemmas which appeared in [5]. 

Lemma 2. With the same notation, we have 

— per{s)\q'^ + q + 1, i.e., period of s is a factor of q^ + q + 1. 

— s has the following trace representation: 

Sk = Tr{a^) = a^ + = 0, 1, 2, • • • . 

If the order of a satisfies an additional condition, then we have the following 
result whose proof can be found in [7]. 

Lemma 3. With the same notation, let K = GF{p^), if ord{a)\p^ —p+1, then 
f{x) = x^ — ax^ + oFx — 1, a G iL. 

2.5 Fundamental Results on 3rd-Order Char-sequences 

Here we summarize some results obtained previously. In [5], all results on 3rd- 
order char-sequences related to the GH Diffie-Hellman key agreement protocol 
are presented in the finite field GF{p). However, all these results are also true 
in K = GF{q). So we just list the following two lemmas and the proofs of these 
lemmas are exactly the same as that described in [5] . Similar results can also be 
found in corollary 2.3.5 in [7]. For the sake of simplicity, we write Sk = Sk{a,b) 
or Sk{f) to indicate the generating polynomial. Let f~^{x) = x^ — bx^ + ax — 1, 
which is the reciprocal polynomial of f{x). Let {sfc(6, a)} be the char-sequence of 
f~^{x), called the reciprocal sequence of {sk{a, b)}k>o- Then we have s_fc(a, b) = 
Sk{b,a),k = 1,2, •• • (see [5]). 

Lemma 4. Let f{x) = x^ — ax^ + bx — 1 be an irreducible polynomial over K 
and s be the char-sequence of f{x) and a be a root of f{x) in GF(q^). Then 

1. For all integers r and e, 

Sr(Se(a, &),s_e(a, 6)) = Sre(a,b). 

2. For all integers n and m, 

(a) S 2 n = Sn~ 2s-n, and 

(b) SjiSjji Sn—mS—m — ^n+m — ^n—2m- 

3. If gcd{k,per{s)) = 1, then ,i = 0,1,2 are three roots of g{x) = x^ — 
Skx'^ + S-kX — 1 in GF(q^). 
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Lemma 5. Let k = koki ■ • • kr = X)i=o binary representation 



of k. Let Tq = ko = 1, and Tj = kj + 2Tj_i,l < j < r. So, = k. Lf 
, STj-i+i) is computed, then {sTj-i,STj,STj+i) can he computed 
according to the following formulas. 

For kj = 0 

STj-l = Sr,_iSTj_i-l - bS-Tj_^ + S_(Tj_i + l) (2) 

SG- = 4,-1 - 2s-Ti_i (3) 

st,+i = Sr, _i St, - 1+1 - as_T,_i + s_(Tj.i~i) (4) 

For kj = 1 

ST,-1 = s|^_^ - 2 s_t,_i (5) 

St, = st,_iSt,_i+i - as_T,_i + s_(r^_j_i) (6) 

st,+i = St,_i+i - 2s_(t,_i+i) (7) 



Thus, to calculate a pair of kth terms Sk and s_fc of the sequence s needs 
9logk multiplications in GF{q) in average. 

2.6 The GH DifRe-Hellman Key Agreement Protocol 



In this subsection, we will review the GH DifRe-Hellman (DH) key agreement 
protocol. (Note. In [5], the GH-DH was presented in GF{p). As we have men- 
tioned in the beginning of Section 2.5, all results can also be true in GF{q), 
where g is a power of a prime.) In the following discussion, we will present the 
GH-DH in GF{q), where q = p^ in the same setting as in the XTR cryptosystem. 

GH-DH Key Agreement Protocol (Gong and Harn, 1999) [5] : 

System parameters: p is a prime number, q = p^ and f{x) = — ax'^ + bx — l 

which is an irreducible polynomial over GF(q) with period Q = q^ + q + 1. 

User Alice chooses e,0 < e < Q 7 with gcd(e,Q) = 1 as her private key and 
computes (se,s_e) as her public key. Similarly, user Bob has r,0 < r < Q, 
with gcd{r, Q) = 1 as his private key and (sr, s-r) as his public key. In the key 
distribution phase, Alice uses Bob’s public key to form a polynomial: 

g{x) = X^ — SrX^ + S-rX — 1 

and then computes the eth terms of a pair of reciprocal char-sequences generated 
by g{x). I.e., Alice computes 

Sg(s,., s — 7 .) and s — g(sg, s — t’). 



Similarly, Bob computes 



Sr{Se,S-e) and S-r{Se,S-e). 
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They share the common secret key as (ser,s_er)- 

Let Zq = {0, 1,2, ■ ■ ■ ,Q — 1}, Zq contain all numbers in Zg and these num- 
bers are coprime with Q, Rq contains all numbers in Zg and these numbers are 
not conjugate modulo Q respect to q (i.e., any two numbers t and r are conjugate 
modulo Q if there exists some integer j such that r = tq^ mod Q) . 

The mathematical function used in the GH public-key cryptosystem is: 

^i-.Rq^KxK 

l ^ (Sj,S_j) 

where s is the 3rd-order char-sequence over GF{q) generated by f(x) which is 
an irreducible polynomial with a period of Q = -I- <; + 1. In [5], it is shown 

that this is an injective map from Rq to K x K. 

Remark 1. The XTR [7] is designed based on the char-sequences generated by 
the 3rd-order polynomial of f{x) = — ax^ + a^x — 1 which is irreducible over 

GF{q) with period Q\p^ —p+1. The XTR only uses one char-sequence instead 
of a pair of reciprocal char-sequences. The mathematical function used in the 
XTR public-key system is: 

it-.Rq^K 

l ^ Si 

However, the GH system is based on the char-sequences generated by the 3rd- 
order polynomial of — ax^ + hx — 1, where a and b are from GF{p^). Thus, 
the 3rd-order char-sequence used to construct the XTR cryptosystem is just 
a special case used in the design of the GH cryptosystem. Two schemes have 
the same efficiency when they are applied to the DH key agreement protocol, 
because the GH-DH computes a pair of elements over GF{p^) and shares a pair 
of elements over GF{p^), and the XTR-DH computes one element over GF{p^) 
and shares one element over GF{p^). 

In the following sections, we will explore some useful properties of 3rd-order 
char-sequences over K and use these results to the design of a new GH digital 
signature algorithm. For additional results on LFSR sequences and finite fields, 
the reader can refer to [3,9,10]. 

3 Fast Computational Algorithm 

Based on the Signed-Digit Representation 

3.1 A New Signed-Digit Number Representation 

Definition 1. Let A = a„_ia „_2 • • • ag, Ui € {0, 1} be a binary representation. 
Then A = • • • &o, bi G {—1,0,1} is called the binary maximal-weight 

signed-digit (SD) representation of A, if there does not exist another binary SD 
representation of length n for A whose Hamming weight is higher than that of 
the maximal-weight SD representation. 
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An algorithm to obtain such an SD representation is given in Appendix A. The 
following lemma is obvious from Algorithm 4 in Appendix A. 

Lemma 6. Let a„_iO „_2 • • • ao> G {0) 1} (I'nd a„_i = 1 &e the binary repre- 
sentation of integer A. Let d,Q < d < n — 2, he the smallest integer such that 
Od 7 ^ 0. Then the Hamming weight of the binary maximal-weight SD representa- 
tion of A is n — d. Moreover, all the zeroes are associated with least significant 
bit positions. 

Some examples are given for the maximal-weight SD representations: 
101100111 = 1 1 T 1 1 T T 1 1 and 11001011000 = lllTTlTl00 0. When this 
SD representations is involved in computing exponentiation-like functions using 
square-and-multiply method, it is obvious that efficiency can only be achieved 
when the “squaring and multiplication” is less expensive than the “squaring”. 
It appears to be the case when computing terms in a third-order recurrence 
sequence as it will be discussed in detail in the next subsections. 

3.2 Fast Computational Algorithm of Recurrence Terms 

Let k be given in its maximal-weight SD representation as k = k^ki ■ ■ ■ kr = 
k e {—1,0,1}. It can be proven that Lemma 5 still holds true if 
we add the following formulas. 

For kj = —1 

STj-l = St,.!-! - 2s_(Tj_i~1) (10) 

STj = St,_iSTj_i- 1 - + S-(Tj_i + l) (11) 

ST,- + 1 = - 2s_t,_i (12) 

With values of initial terms as sq = 3, si = a, S 2 = of — 2b, s_i = b, and 
s _2 = b^ — 2a, To = ko = 1 and Tj = kj -\- 2Tj_i , 1 < j < r,T^ = k, a pair of dual 
terms, Sk and s-k for fc > 0, can be computed based on the following algorithm. 

Algorithm 1 Computing s±k 

1. Set up initial values: stq-i = S-Tq-i-i = 3; sto = a; stq+i = of — 2b; S-jb = 
b; S-To-i = b‘^ - 2a; 

2. IF fcr = 0 THAN find h < r, such that kh ^ 0 and kh+i = kh +2 = ■ ■ ■ = 
kr = 0, ELSE h = r; 

3. IF /r > 1 THEN FOR i=lAO h-l 

(a) IF fci = 1 THEN 

i. compute st, and STj±i using (5)-(7); 

ii. compute and s_Tj±i using (10)-(12); 

(b) ELSE 

i. compute STj and ST^ii using (10)-(12); 

ii. compute S-Ti and s_Ti±i using (5)-(7); 

4. FOR i = Max{l,/i| TO r 
(a) compute s±Ti using (3); 
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The final value is st^ = Sk- Note that Tj and Tj_i should be respectively replaced 
by —Tj and — Tj_i in (2)-(12), when these formulas are used in computing 
and S-Ti±i as shown in Steps 3(a)(ii), 3(b)(ii) and 4(a) in the above algorithm. 

Since the implementation of (5)-(7) or (10)-(12) is less costly than that of 
(2)-(4), certain efficiency can be achieved by using the maximal-weight SD rep- 
resentations. It can be shown that an evaluation of (5)-(7) or (10)-(12) needs one 
multiplication, two squarings, and one constant multiplication in GF{q); while an 
evaluation of (3) requires one squaring in GF(g). Thus, Step 3 needs two mul- 
tiplications, four squarings, and two constant multiplications in GF{q); while 
Step 4 requires two squarings in GF(g). With the assumptions that Step 3(a) or 
Step 3(b) has to be performed for /i — 1 times and Step 4(a) has to be executed 
for r — h + 1 times, and also with the estimation of the average value of h as 
included in Appendix B, we have the following lemma: 

Lemma 7. Let k be given in its maximal-weight SD representation, with 
log 2 A: > 10, then, on the average ease (which is also the worst ease), to com- 
pute a dual pair {s-k,Sk} using Algorithm 1 needs 41og2fc multiplications and 
41og2fc squarings in GF{q). On the best case to compute both Sk and S-k needs 
21og2fc multiplications in GF{q). 

Note that half number of 4 log 2 k multiplications are in fact contant multiplica- 
tions if both a and b are contant. 

3.3 Complexity of Computing Recurrence Terms When q — 

When —1 is a quadratic non-residue in GF(p), the binomial f{x) = x'^ -\- 1 
is irreducible over GF(p). Let a be a root of f{x). Under previous assump- 
tions, {l,a} forms a polynomial basis in GF(p^) over GF(p). Any two ele- 
ments, xandy GGF(p^), can be represented in the polynomial basis as x = 
xo + xia and y = yo + yia, xq, xi,yo,yi €GF(p). (It is worth to mention 
that, in XTR system [7], the irreducible trinomial f(x) = x^ -\- x -\- 1 and 
the normal basis have been chosen.) A multiplication in GF(p^) can be rep- 
resented by xy = {xo -\- xia){yo -\- yia) = (xqJ/o - xiyi) -\- {xoyi + xiyo)a = 
[xoiVo + Vi) - yi{xo + xi)] + [xoiyo + Vi) + Voixi - Xo)]a. Thus, three mul- 
tiplications in GF(p) are needed. Since the squaring can be represented by 
= (xo + xio}'^ = {xq — xf) -\- 2xoXia = {xq — x\){xq + x\) -\- 2xeiXia, 
two multiplications in GF(p) are required for a squaring in GF(p^). The con- 
stant multiplication is the case where the multiplicand is a fixed element. If 
the constant element can be chosen to be a number with a specific form, then 
the constant multiplication can be extremely efficient. For example, if both xq 
and x\ can be chosen to be a small power of two, then it can be seen from 
xy = (a:o + xio){yo + yio) = (xqJ/o - xiyi) -F {x^yi + xiyo)a that the mul- 
tiplication of xy can be obtained for free. If we choose the constant element 
X = xq-\- xia such that one of the two coefficients xg and xi is a small power of 
2, then only two multiplications in GF(p) are needed to perform multiplication 
of xy in GF(p^). We summarize the above results in the following lemma: 
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Lemma 8. 

1. A multiplication in GF{p^) can he realized by performing three multiplications 
in GF{p). 

2. A squaring in GF{p^) needs two multiplications in GF{p). 

3. A constant multiplication in GF{p^) 

(a) can he realized for free if both the coefficients of the constant element can 
he chosen to he a small power of 2. 

(b) can be realized by performing two multiplications in GF{p), if one of the 
two coeffleients of the eonstant element ean he chosen as a small power 
of 2. 

^From Lemmas 7 and 8, we can find the complexity of computing s±k using 
Algorithm 1 and we summarized this result in the following lemma: 

Lemma 9. Let q = p^ and k he given in its maximal-weight SD representation, 
with log 2 k > 10, then on the average case (which can also he the worst case) to 
compute a dual pair {s-k,Sk} using Algorithm 1 needs: 

1. at most 201og2fc multiplications in GF{p); 

2. at most 181og2fc multiplications in GF{p), if one of the two coeffleients of 
the eonstant elements a,b & GF{p^), can he chosen to he a small power of 2; 

3. at most 161og2 k multiplications in GF{p), if one constant element is chosen 
in such a way that one of the eoeffeients is a small power of 2, and the other 
constant element is chosen such that both coefficients are small powers of 2; 

4- at most 141og2 k multiplications in GF{p), if the both constant elements can 
he chosen such that all the coefficients are small powers of 2. 

On the best ease to compute both Sk and S-k needs 41og2fc multiplieations in 
GF{p). 

4 Redundancy in States of the 3rd-Order Char-Sequences 

In this section, we will introduce the duality law of a pair of reciprocal char- 
sequences. Under this duality law, we can address some redundancy in states of 
char-sequence over AT. 

4.1 Duality Law 

Let f(x) = — ax^ -I- — 1 be an irreducible polynomial over K and s be its 

char-sequence. We define a dual operator as given below: 

D{sk) = S-k 

Fl{sk, * * * , = fs — k, 5 ‘ ‘ ‘ 5 (-t-i) ) 5 ^ ^ t ^ 0, 

where T = (sfc, s^+i, • • • , Sk+t) is a segment of s. 

We call (sfc, Sfe+i, • • • , Sk+t) and D{sk, Sk+i, ■ ■ ■ , Sk+t) a dual segment of s or 
f{x). If t = 0, we call Sk and S-k a dual pair of s or f{x). 
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Let h{xi,X 2 , • ■ ■ , Xt) G K[xi, X 2 , ■ ■ ■ ,Xt], i.e., is a multivariables polynomial 
over K. We define 

D{h(ySi^, Si 2 , ■ ■ ■ , Sit )) “ , S—i^ j ■ ■ ■ ; s_it )jij G Z. 

Duality Law. Let f{x) = x^ — ax^ -I- — 1 be an irreducible polynomial 

over K, s be its char-sequence and D be the dual operator. Then D{D{T)) = T, 
D{D{h)) = h and 

^('Sii 1 St2 5***5 ) 0 e-i St2 5***5 ^* 



4.2 Property of Redundancy 

In the following theorem, we will show that three elements in any state of the 
3rd-order char-sequence are not independent. If we know any two consecutive 
elements, the third remaining one can be uniquely determined according to a 
formula. 

Theorem 1. Let f{x) = x^ — ax^ -I- 6cc — 1 he an irreducible polynomial over 
K and s be its char-sequence. For given the dual segment (sk,Sk+i) and (s_fc, 
s_(fe+i)), we assume that A = Sfc+is_(fe+i) — sis_i yf 0. Then Sk-i and its dual 
s_(fe_i) can be computed by the following formulas: 



Sk-l 



S-(fc-l) 



es_(fc+i) — s_iD(e) 

z 

D(e)s(/ty-i) sic 



(13) 

(14) 



where 

e=— s_iZ?(ci)-|-C 2 , where ci=siSfe+i— s_iSfc and C 2 =s|— 3s_fc-|-(6^— a)s_(fc+i). 

(Note. Here si = a and s_i = b. In order to keep symmetric forms in the 
formulas, we keep on using si and S-i.) 

Proof. A sketch to prove this theorem is given below. From U = {sk-i, Sk, Sk+i, 
Sk+ 2 ) and its dual, we will form four linear equations in terms of four variables 
Sfc+ 2 , s_(fc+ 2 ) 5 'Sfc-i 5 s_(fc_i). Then based on linear algebra, we can solve these 
equations and obtain (13) and (14). 

We now start to construct the linear equations. Since C/ is a segment of s 
generated by f{x), it satisfies the linear recurrent relation. 



Sfc+2 — SlSfc-l-1 ~ S-iSk + Sk-l. 



Thus, we have 



Sk -\-2 ^k —1 — Cl where ci — sis/j,y_i S—iSk. 



(15) 
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Applying Duality Law to the above expression, we have 

s_(fc+ 2 ) - S-(fe-i) = D(ci). (16) 

Let n = k — 1, m = k + 1 in formula 2{b) in Lemma 4, we get 

Sfc-lSfc+l — S_2S_(fc+i) = S2k — S_(fc_|_3). (17) 

Since (s_fc, s_(fc+ 2 )> S-(fc+3)) is a dual of U, it satisfies the following 

linear recursive relation 



S-(fc+3) — S-lS-(fe+2) ~ SiS_(fc+i) + S_fe. 

Note that S 2 k = s\ — 2s -k from 2(a) in Lemma 4. Substituting s_(fc_|_ 3 ) and S 2 k 
in (17) respectively with the above two identities, it follows that 

•Sfc-lSfc+l — S_ 2 S-(fc+l) = s\— 2s-k — S-lS-(fc+ 2 ) + SlS-(fe+l) ~ S_fc. 

Since s _2 + si = 6^ — 2a + a = 6^ — a, we have 

s_is_(fe+2) + Sfc+iSfc-i = C2 where C2 = s\- 3 s-k + {b^ - a)s_(fe+i). ( 18 ) 

By Duality Law, we have 



SlS(fc+2) + S-(fe+l)S-(fe-l) — D{c2)- (19) 

The equations (15), (16), (18), and (19) form four linear equations in terms of 
variables Sfc+ 2 , s_(fc_|_ 2 ), Sfe_i, Let A be the matrix of the coefficients of 

this linear system, i.e.. 



(1 


0 


-1 




0 


1 


0 


-1 


0 


S-1 


Sfe-K 


0 


\S1 


0 


0 


S-(k+l)/ 



Therefore, this linear system can be written as 

AS'^ = C^, (20) 



where S = (sfc+ 2 , s_(fc+ 2 ), Sfe_i, s_(fc_i)), C = (ci, D(ci), C 2 , £>( 02 )) and X'^ is 
the transpose of the vector X. Let A = (A,C'^). Then the reduced row-echelon 
form of A is given below: 






(1 


0 


-1 


0 


Cl \ 


0 


1 


0 


-1 D{ci) 


0 


0 


Sfc-l-1 


S-l 


e 




0 


Si 


s_(fe+i) D{e) j 



where e = — S-i-D(ci) -I- C 2 - Thus (20) has a unique solution if and only if 
det{B) yf 0, where 

B=(-^k+i s_i \ 

V Si S_(fc+1) J 
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Since det{B) = A 0, then Sk-i is given by 

det 



Sk-l = 



e s_i \ 
D{e) s_(fc+i)y 



which yields (13). The validity of (14) follows from the Duality Law. 



Corollary 1. With the same notation as used in Theorem 1, the dual pair Sk +2 
and s_(fc_|_ 2 ) are given by 

Sfc +2 = Sfc-i + Cl and s_(fc_|_ 2 ) = s_(fc_i) -I- D{ci). 

Remark 2. If Z\ yf 0, then three elements in a state (sfe_i, s^+i) and their 
duals are dependent. With the knowledge of any two consecutive elements and 
their duals, the third one and its dual can be uniquely determined by Theorem 1. 
If Z\ = 0, then the third element in a state of s may have more than one solution. 
For the case of knowing (sfe_i, Sk) and its dual to compute s^+i and its dual, it 
is similar to the previous case that we have discussed. We will not include the 
discussion here. 



Remark 3. In [8], Lenstra et. al. have also given a formula to compute Sfc_i 
(or Sfc+i) with the knowledge of (sfe,Sfc+i) (or {sk-i,Sk) ) for a special case of 
K = GF{p^) and f{x) = — ax"^ + a^x — 1. Here, we have discussed more 

general cases and proposed a simpler proof. The formulas between these two 
approaches are different. The technique used in [8] can not be extended to the 
general case of the char-sequences. 



5 The GH Digital Signature Algorithm 

In this section, we explain the method to evaluate Sc(h-dk) smd its dual with the 
knowledge of Sk and its dual; but without knowing k. Then, we apply this result 
together with Theorem 1 in Section 4 and Algorithm 1 in Section 3 to the design 
of GH ElGamal-like digital signature algorithm (GH-DSA) . 

5.1 Computation of a Mixed Term Sc(h—dk) 

The following lemma is a direct result from the definition of LFSR sequences. 

Lemma 10. With the same notation of f{x), s, let {sk-i, Sk, Sk+i) be a state of 
s and ube a sequence generated by f{x) with {sk-i, Sk, Sfc-i-i) as an initial state. 
I.e., 

uq — — Sk: and U 2 — 

Then, u^+i), the (v—l)th state ofu, is equal to the (v—l+k)th 

state of s. In other words, we have 

{Uv—l:^v:^v+l) (^i; — 1-t-fc 5 j '^u-t-l-t-fc) ■ 
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For simplicity, we denote {{sk-i, Sk, Sk+i), f{x)) as a sequence generated by 
f{x) with an initial state {sk-i, Sk, Sk+i)- 

Algorithm 2 Assume that f{x),{sk,Sk+i) and its dual are given. Let Q = 
per{s). Assume that c, h and d are given integers with gcd{d,Q) = 1. Then 
Sc(h-dk) o,nd its dual can he computed according to the following procedures: 

1. Compute V = —hd~^ mod Q and u = —cd mod Q. 

2. Compute the Sk-i and its dual according to Theorem 1. 

3. Compute (v — l)th state of a sequence generated by {{sk-i, Sk, Sk+i), f{x)) 
according to Algorithm 1. This step gives Sy+k and its dual. 

4 . Construct g{x) = x^ — Sy+kx"^ + s-(y+k)X — 1 and compute Su{g), s-u{g) 
according to Algorithm 1. 

Here, we have Sy{g) = Sc(h-dk) and S-u{g) = s_(^c(^h-dk))- 

Note. All results that we have discussed so far are true for general q and Q. 

Lemma 11. With the same notation as used in Algorithm 2, to compute 
Sc(h-dk) and its dual needs 2 ■ 4{logv + logu) multiplications and 2 ■ 4{logv + logu) 
squarings in GF{q) in average. In particular, if q = p^, to compute Sy^h-dk) and 
its dual needs 2 ■ 20{logv + logu) multiplications in GF{p) in average. 

Proof. In Algorithm 2, the computational cost depends only on how many times 
Algorithm 1 is invoked. Since Algorithm 2 invoked Algorithm 1 twice, applying 
Lemma 7, it needs 16 multiplications in GF{q). According to Lemma 9, for in- 
voking Algorithm 1 each time, it needs 2Q(logv + logu) multiplications in GFfp). 
In total, it needs 2 • 20{logv + logu) multiplications in GF{p). 

Remark 4- When we apply Algorithm 2 to the char-sequences used in the XTR, 
it can save the matrix computation as given in Algorithm 2.4.8 [7]. So, this 
algorithm is more efficient than the algorithm given in [7] . 

5.2 The GH Digital Signature Algorithm 

We are now ready to present the GH ElGamal-like digital signature algorithm. 
Note that the GH signature scheme can also be modified into variants of gener- 
alized ElGamal-like signature schemes as listed in [6]. 

Algorithms fGH-DSAj 

System public parameters: p is a prime, q = p^, and f{x) = x^ — ax^ + 
bx — 1 which is an irreducible polynomial over GF{q) with period Q, where Q 
satisfies the condition that Q = P 1 P 2 , P\ is a prime divisor of p^ + p + \ and 
P 2 is a prime divisor of p^ — p + 1. Let GF{p^) he defined by the irreducible 
polynomial of x^ + 1 (see Section 3) and 7 he its root in GF{p^). 

Alice.- Choose x, with 0 < x < Q and gcd{x,Q) = 1 as her private key and 
compute (sx,S-x) as her public key. For a message m that Alice needs to sign, 
she follows the procedures: 
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1. Randomly choose k, with 0 < k < Q, and gcd{k, Q) = 1, and use Algorithm 1 
to compute (sk-i, Sfe, Sfc+i) and its dual such that r = sup + Sfc,i_p is coprime 
with Q, where Sk = Sfc,o + Skp'j- (Here, we adopt the similar approach as 
used in the Elliptic Curve digital signature algorithm [12] to form an integer 
r in digital signing process.) 

2. Compute h = h{m), where h is a hash function. 

3. Compute t = k~^{h — xr) mod Q (i.e., the signing equation is: h = xr + kt 
mod Q.) 

Then (r,t) is a digital signature of the message m. Alice sends Bob {m,r,t) 
together with {sk,Sk+i) and its dual. 

Bob; Perferming the following verifying process 

Check if gcdftjQ) = 1. 

Case 1. gcd{t,Q) = 1. 

1. Compute V = tr^~^'> mod Q and u = hr^~^'> mod Q. 

2. Compute Su-vk and its dual according to Algorithm 2. 

3. Check if both Su-vk = Sx and S-(u-vk) = S-x. If so, Bob accepts it as a valid 
signature. Otherwise, Bob rejects it. 

Case 2. gcd{t,Q) > 1. 

1. Compute Sfi-rx and its dual according to Algorithm 2. 

2. Form g{x) = x^ — -I- s_fcX — 1 and compute St{g) and its dual according 

to Algorithm 1. 

3. Check if both Sh-rx = St{g) and s_(^_ra;) = •S-t(ff)- If so. Bob accepts it as 
a valid signature. Otherwise, Bob rejects it. 

Lemma 12. The security of the CH-DSA is based on the difficulty of solving 
the discrete logarithm in GF{q^) = GF(p^). The signing and verifying processes 
need respectively 20logQ multiplications and 2 • 20logQ multiplications in GF{p) 
in average. 

Proof. Since /(x) is an irreducible polynomial over GF(q^) and the period of 
/(x) is Q = P 1 P 2 , where Pi\p'^ + p + 1 and P 2 \p^ — p -I- 1, a root of /(x) is in 
GF(p^) — (GF{p^) U GF{p^)). Similarly, as we have proved in [5], the problem 
of solving for x from (sx,S-x) or solving for k from (sk,S-k) is equivalent to 
compute DL in GF{p^). Thus, the first assertion is established. 

Note that the probability of any number less than Q which is not coprime 
with Q is given by 

Prob{gcd{z, Q) > 1 : 0 < z < Q} = ^ ^ ^ . (21) 

P 1 P 2 

Thus, in the signing process, we only need to estimate the computational cost 
for invoking Algorithm 1 at Step 1, which is 20logQ multiplications in GF(p) in 
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average. For case 1 in the verifying process, it can be seen that from Lemma 9 
invoking Algorithm 2 at Step 2 needs 2 • 20logQ multiplications in GF{p) in 
average. In Case 3, it invokes Algorithm 2 at Step 2 and Algorithm 1 at Step 3. 
Thus, it needs 3 • 20logQ multiplications in GF(p) in average. Combined with 
(21), the verifying process needs 2 • 20logQ multiplications in GF{p) in average. 

6 Conclusion 

In this paper, we discuss an efficient algorithm that utilizes the signed-digit 
representation to compute the k term of a characteristic sequence generated by 
a linear feedback shift register of order 3 over GF(q). Then we propose an efficient 
algorithm to compute the {h — dk)t]i term of the characteristic sequence based on 
the knowledge of the fcth term where k is unknown. By using these new results 
on the characteristic sequences, the GH-DSA (Digital Signature Algorithm) is 
developed. 

Remark 5. The GH cryptosystem, just like the elliptic curve public-key cryp- 
tosystem, enjoys the benefit of using a shorter key to achieve high security. Also, 
the GH cryptosystem can be resistant to power analysis attack and timer analy- 
sis attack without increasing cost of computation. This is due to their evaluation 
formulas as given in Lemma 5 of Section 2. 
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Appendix 

A An Algorithm 

to Obtain Maximal- Weight SD Representation 

When the binary representation of an integer is given, its binary maximal-weight 
SD representation can be generated with the following algorithm. 

Algorithm 4 Maximal-weight signed-digit recoding 

Input: the binary representation of A: a„_ia „_2 • • • oq, Oj € {0, 1} 

and a„_i = 1; 

Output: the binary maximal-weight representation of A: 

^n—l^n—2 ' ’ ' ^ { f 5 Oi fh 

1. initialize the flag: t = 0; 

2. FOR f = 0 TO n - 2 

(a) IF t = 0 THEN 

i. IF Oi = 0 THEN b, = 0; 

ii. ELSE {t = I; IF Oi+i = 0 THEN bi = -I; ELSE bi = 1;} 

(b) ELSE 

i. IF (oi = I AND a*+i = 0) THEN b, = -I; 

ii. IF (oi = I AND a*+i = I) THEN h = 1; 

iii. IF (oi = 0 AND a,+i = 0) THEN b, = -1; 

iv. IF (oi = 0 AND a^+l = I) THEN b, = 1; 

3 - bji—i — (IrL-l-j 

The correctness of this algorithm can be proved. It is worth to point out that 
the maximal-weight SD representation always has the same length as the binary 
form. If the maximal-weight SD representation of a negative integer (—A) is 
required, it can be obtained by negating each bit in the maximal-weight SD 
representation of A. It can be seen that the Hamming weight of the maximal- 
weight SD representation of —A is the same as that of A. 
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B An Estimation of the Parameter h 

In Algorithm 1, Let k = k^ki • ■ • kr, 2'^ < k < 2’’+^ — 1, be given in its maximal- 
weight SD representation, then from Lemma 6, we have Pr{ki = 0, 0 < i < r} = 

r{r-l) _ 

(r-i)x 2 ’- “ 2 ^- Thus, the average value of h can be given hy h = 

The following table shows some values of h{r) as a function of r. 



Table 1. Some values of h{r), Max{/i(r)}, Min{/i(r)} and r. 



r 


2 


3 


4 


5 


6 


7 


8 


9 


10 


12 


r > 15 


h{r) 


1.75 


2.62 


3.62 


4.69 


5.77 


6.84 


7.89 


8.93 


9.96 


11.98 


r 


Max{/i(r)} 


2 


3 


4 


5 


6 


7 


8 


9 


10 


12 


r 


Min{/i(r)} 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 



Although the value of h can be as small as 0, it can be seen that the average 
value h{r) is approximately equal to its maximal value r when r > 10. 
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Abstract. A. K. Lenstra and E. R. VerheuI in [2] proposed a very ef- 
ficient way called XTR in which certain subgroup of the Galois field 
GF(p®) can be represented by elements in GF(p^). At the end of their 
paper [2], they briefly mentioned on a method of generalizing their idea 
to the field GF(p®™). In this paper, we give a systematic design of this 
generalization and discuss about optimal choices for p and m with respect 
to performances. If we choose m large enough, we can reduce the size of 
p as small as the word size of common processors. In such a case, this 
extended XTR is well suited for the processors with optimized arithmetic 
on integers of word size. 



1 Introduction 

After Diffie-Hellman (DH) key agreement protocol was published, many re- 
lated key agreement protocols have been proposed. Very recently, in [2] A. K. 
Lenstra and E. VerheuI proposed an efficient computational tool called XTR 
(Efficient and Compact Subgroup Trace Representation) and showed that it 
can be adopted to various public key systems including key exchange protocols. 
Their scheme results in relatively efficient system with respect to the computa- 
tional and communicational complexity compared to currently known public key 
schemes using subgroups. At the end of their paper, they mentioned very briefly 
that XTR can be generalized in a straightforward way using the extension field 
of the form GF(p®™) and made some general comments with the focus on the 
case p = 2. 

In this paper, we carry out the generalization in detail and discuss about 
optimal choices of the parameters p and m. The idea is mostly straightforward, 
but we need to be more systematic to find out optimal choices of p and m 
among the possible cases. In more detail, the generalization is done in two steps. 
First, we propose a systematic design for XTR-like system in GF(p®"*) using an 
irreducible cubic polynomial F(c, X) = — cX'^ + c^ X — 1 G GF(p^™) [X] for 

* Yie and Kim’s work was supported by INHA Univ. Research Grant (INHA-21072). 



S. Vaudenay and A. Youssef (Eds.): SAC 2001, LNCS 2259, pp. 301-312, 2001. 
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any m. Then we determine the required properties of the parameters m, p, and c 
for F(c,X) under the efficiency and security considerations. We are focusing on 
the case that is efficient in the limited applications such as smart cards. We use 
an optimal normal basis to represent elements of GF(p^"*) over GF(p). Hence we 
consider the case where 2m + 1 is a prime and p is a primitive element modulo 
2m + 1 . We suggest to use m such that either 2m + 1 is a Fermat prime or both 
m and 2m + 1 are primes. With such a choice of m, a randomly chosen prime p 
has a better chance to be a primitive element in Z 2 m+i- 

We estimated the required computational complexity for XTR extended to 
GF(p®™) under considerations as the above, and compare the result with XTR 
in GF(P®) where P and p®™ have the same bit sizes. The result shows us that 
the required number of bit operations for both are about the same. 

Modern workstation microprocessors are designed to calculate in units of 
data known as words. For large prime p, multiple machine words are required to 
represent elements of prime field GF(p) on microprocessors, since typical word 
sizes are not large enough. This representation causes two possible computational 
difficulties: carries between words must be treated and reduction modulo p must 
be performed with operands of multiple span words. 

Hence we see that using prime number p as small as the word size of common 
processors for XTR relieves the above computational difficulties for operation 
in GF(P) for large prime P. In this way, the proposed generalization maintains 
the communicational advantages as in XTR and it enhances the computational 
advantages over XTR since there is no need of multiprecision computation es- 
pecially when the system is implemented under workstation processors with 
optimized arithmetic on integers of word size. 

Unfortunately, there are some drawbacks in extending XTR to GF(p®’”). As 
m gets larger there are fewer prime numbers p, q that can be used to establish 
an XTR-like system in GF(p®"*). Also it takes longer to generate the primes p, q 
with the required properties. But generating p, g is a one-time task and it is not 
a serious disadvantage in many cases. 

Our paper is organized as the following. In Section 2, we describe the gener- 
alized system in such a way that it can be formulated for any m, setting aside 
any security or complexity related concerns. In Section 3, we discuss about the 
security of the system and determine the choices of parameters. In Section 4, 
we estimate the computational complexity for the proposed XTR-like scheme in 
GF(p®™). Then we discuss about optimal choices of parameters. In Section 5, we 
conclude our paper with comparison of efficiency and security for cryptographic 
schemes using this generalization with those using the original XTR under the 
Galois fields with about the same sizes. We also discuss about an efficient way 
of parameter generation of XTR system extended to GF(p®"*) and recommend 
good choices of m for current use in Appendix A. 
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2 A Description of XTR Extended to GF(p®"^) 

In this section we describe the XTR system using elementary symmetric poly- 
nomials and give a systematic way to generalize XTR-like system to GF(p®"*). 
Following [2], we start by setting up some notations. Given an element c G 
GF(p 2 m)^ we define the cubic polynomial F{c,X) as following: 

F{c, X)=X^~ (f”'x - 1 = {X - ho){X - hi){X - h2), 

where the roots hiS are taken from the splitting field of F(c, X). We set c„ = 
liQ+hiF ft -2 for any integer n. Then from the root-coefficient relations of a cubic 
equation, we have 

Lemma 1 . For any integers n and t we have 

1. Cl = c, h^hi + hih2 + hoh2 = d’ , and hohih2 = 1; 

2 . C—Yl Cf^pm j 

3. Either all hi’s are in GF(p^™) or F{c,X) is irreducible over GF(p^’”) and 
all hi ’s have order dividing — p™ -I- 1; 

4 - — ^tn — (ct)n- 

Proof. Item I is nothing but the root-coefficient relation. Items 2 and 3 can 
be proved exactly the same way as Lemma 2.3.2 of [2] is proved. 

To prove item 4, note that 

Cn = hg + hi + h2, = C-n = (^1^2)” + (^0^2)” + (^O^l)"- 

This implies that 

F(c„,X) = X3 - c„X2 + cf X - 1 = (X - h^){X - h^){X - h^). 

And thus we see that 

(c„)t = + (h^y + {hyy = hyy + < + hf. 

Hence we see that (c„)t = Cnt = {ct)n for any integer n,t- 

It can be easily checked that any irreducible cubic polynomial f{x) = — 

ax^ -I- 6cc — 1 G GF(p^™) is of the form /(x) = x^ — ax"^ + aP”"x — 1 if the order 
of the roots /iq, /ii, fi 2 G GF(p®™) of f{x) = 0 divides — p™ -I- 1. 

Recall that the elementary symmetric polynomials ak of degree k in the 
indeterminates Xi,X 2 ,X 3 are given by 



ai = Xi -k X2 -k X3, (72 = X1X2 -k X2X3 -k X1X3, CT 3 = X1X2X3. 

Here is a theorem, due to Newton, so-called ‘fundamental theorem on sym- 
metric polynomials’. 
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Theorem 1. (Theorem 1.75 in [7]) Let ai,a 2 ,o '3 be the elementary symmetric 
polynomials in Xi, X 2 , over a commutative ring R, and let sq = 3, s„ = 
X" + X 2 + X3 € R[Xi, X 2 , X^] for n > 1. Then the following equality 

Sk — Sk-l<Jl + Sk-2<^2 — Sfe-sO’a = 0 



holds for k > 3. 

As a direct application of Newton’s Theorem the following lemma can be 
easily proved. 

Lemma 2. Then for any positive integer n we have 

1- Cn-\-2 — Cyi+lC CnCT -\~ Cn—\f Cj^—i — Cn.\-2 A CnC^ j 

2. C 2 n = cl - 2cP”' ; 

d. C2n+1 — CjiCjiJ^x Ccdf ~\~ 

4.. C2n-l = CnCn-l ~ C^"' 

As in [2], we denote S'„(c) = (c„_i, c„, c„+i) for any integer n. Then by 
Lemma 2.2, we see that S'-i(c) = — 2c, c^™, 3), S'o(c) = (c^’"’,3,c), and 

S'i(c) = (3,c, — 2cP ). Thus, if we do not care about efficiency or security 

we can define XTR-like key exchange system as following for any prime p and 
positive integer m and for any c G GF(p^’”). 

XTR-DH key exchange system in GF(p®’”): 

1. Alice chooses a random integer n and computes S'„(c) then sends c„ to Bob. 

2. Bob chooses a random integer t and computes S't(c) then sends Ct to Alice. 

3. Alice and Bob share the key c„t that can be obtained by computing either 

Sn{ct') — ((0^)71—!, (0^)77, (^Ct)n+1^ — {Cn)t^ (Cn)i+l)- 



All the XTR-based schemes given in [2] can be extended to GF(p®™) similarly. 
In the following sections, we will discuss about the parameter selections to meet 
various security levels and to boost up the efficiency. 



3 Parameter Selection for Security Consideration 

Various XTR-based public key systems or key exchange protocols rely their 
security on the Discrete Logarithm Problem(DLP) in the base g G GF(p®™), 
where g is a root of the cubic equation F(c^ X) = 0. Therefore, in order to make 
XTR-based schemes secure, we need to use parameters p, m, c such that the DLP 
in the base g is difficult. In this section, we follow the method in [1] to determine 
the size of the subgroup generated by g to prevent known attacks on DLP in 
extension fields. 

Up to this point one of the best attacks known for the DLP is the Index 
Galculus Method using Number Field Sieve. For a DLP in the base g G GF(p®"*), 
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the asymptotic complexity of the Index Calculus Method using Number Field 
Sieve is 

1/3, 1.923 + 0(1)), 

where s is the smallest divisor of 6m such that g is contained in a subfield of 
GF{p^™) isomorphic to GF{p‘^). Thus, for the security reason, it is desirable to 
use g G GF(p^"‘), which is not contained in any proper subfield of GF(p®™). 
Note that if a root g of the polynomial F(c,X) is not contained any proper 
subfield of GF(p®™) then F{c,X) is irreducible over GF(p^’”). Hence we have 
c = Tr((/) and the roots of F{c, XT) are conjugates of g over GF(p^’”). Here, Tr : 
GF(p®™) ^ GF(p^'") is the trace projection defined by Tr(x) = x + +x^'^'^ 

for X G GF(/’”). 

The following Lemma, which follows directly from Lemma 2.4 of [1], gives 
a sufficient condition for a subgroup of GF(p®’")* not to be contained in any 
proper subfield of GF(p®™). 

Lemma 3. Let q be a prime factor of<Pern{p)- Then the subgroup o/GF(p®’”)* 
of order q is not contained in any proper subfield o/GF(p®™). 

Here <1>„(X) is the n-th cyclotomic polynomial for a positive integer n not divis- 
ible by p. For computations with cyclotomic polynomials, we can use Theorem 
3.27 in [7]: 

d|6m 

where /i(-) is the Mobius function. 

As we shall see in the next section, we will take m to be a prime or a power 
of 2 (so that 2m + 1 is a Fermat prime) for easy selection of the prime p. Thus 
we have 

{X^^ - 1)(A2 - 1)(A3 - 1){X^ - 1) _ - A™ + 1 

^ “ (X2'" - 1)(A3"" - 1)(A6 - 1)(A - 1) “ X^-X+1 

if m is a prime, or 



<?6m(A) 



(A6"» - 1)(A- - 1) ^ 1 

(X2m_l)(X3"^_l) 



if m is a power of 2. 

When we use a system based on the DLP in a multiplicative subgroup of size 
q of the Galois field, the sizes of q and the underlying Galois field that guarantee 
the security required currently can be determined according to the table in [4]. 
The recommended size for q is much larger than p^ — p + 1 for small or medium 
sized p. Thus in our case we need to take the size q of the subgroup of GF(p®™)* 
to be much larger than p^ — p + 1. In addition, if we take g to be a prime factor 
of -(- 1 then the calculation in the previous paragraph tells us that q 

is a prime factor of ^6m(p)- Then by Lemma 3 the subgroup of order q is not 
contained in any proper subfield of GF(p®"*). 
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Also it is well-known that birthday-type attacks can be applied to get x from 
the given and the complexity of birthday attacks can be estimated by 0{^) 
where q is the order of g. Currently the recommended size for q is q > 2^^^ . Thus 
to make our system as secure as the DLP problem in against currently 

known attacks, it is enough to choose g G GF(p®"‘) so that the order of g is a 
prime factor of — p™ -I- 1 and larger than max(p^ — p -I- 1, 

4 Parameter Selection for Efficiency Cconsiderations 

The most basic computation required in XTR-like schemes is to compute S'„(c) 
from any given c G GF(p^™) and a positive integer n. ^From Lemma 2 we have 

Lemma 4. Let c be any element o/GF(p^™) and A{c) be the 3x3 matrix given 
by 



A{c) = 



/O 0 
1 0 
Vo 1 




For any integer n>l, we have S'„+i(c) = (c)A(c). 



By modifying the ‘square and multiply’ method, we can compute Sn{c) as fol- 
lows: 



Algorithm 2 Let c G GF{p^™) and a positive integer n be given. ^From the 
binary expansion of n = define a sequence {ti} by 



to = mo 

ti = 2ti-i + mi, i > 1. 



To compute Sn{c) = (c„_i, Cn, c„+i), follow the steps: 

Step 1. Set Stfic) = (c^’"*,3,c) if to = 0, or Stfic) = (3,c, - 2c^’"*) ifto = l- 

Step 2. Compute St^ from St^_^ for i = 1,2, - ■ ■ ,k. 

Step 3. Output St,. 

Step 2 of the above algorithm is performed as following. For a fixed i let us let 
d = ti-i for simplicity. If m^ = 0, then 

Su = S2d = (cdCd+i - cP"' +c^+i.Cd-2c^ ,CdCd+i-cc2 +c2_i). 

If mi = I, then 

Sti = S2d+l = {^4 — 2c^ ,CdCd+l — cc^ -|- — 2c^_|_j^). 

As we can see from Lemma 2 and Algorithm 2, the most frequently performed 
operations in our system are the following three types: 

x^,xy,xz-yzP"' for x, y, z € GF(p2™). 
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In order to make the XTR system in GF(p®’") efficient, it is enough to perform 
these operations efficiently. Thus, we consider the case when GF(p^™) has an 
optimal normal basis (ONB) of type I, that is, the case when 2m + 1 is a prime 
number and p (mod 2m + 1) is a primitive element in Z 2 m+i- In this case, the 
cyclotomic polynomial 

v2m-\-l 1 

^2ra+l{X) = = X"™ + 1 + . . . + X + 1 G GF{p)[X] 

Ji. i 

is irreducible and the set Bi = {a,a^,aP^ , . . . of roots becomes an 

Optimal Normal Basis for GF(p^’”). Since p (mod 2m + 1) is a primitive element 
in Z 2 m+i, this basis Bi is setwise equal to the (almost) polynomial basis B 2 = 
{a, a^, a ^, . . . , 

The complexity for elementary operations in GF(p^'") is well studied by 
Lenstra in [1] . Our focus is on the most frequently performed operations 

x'^,xy,xz — yz^ for x, j/, z G GF(p^’") 

as was in [2] . We have similar result as in [2] on the complexity for these opera- 
tions. 

Lemma 5. Let p and 2m + 1 be prime numbers, where p (mod 2m -1-1) is a 
primitive element in Z 2 m+i- Then for x,y,z G GF(p^™), we have 

1. Computing x^ is for free. 

2. Computing x^ takes 80 percent of the complexity taken for multiplications in 

GF(p2™). 

3. Computing xy takes 4m^ multiplications in GF(p). 

4- Computing xz — yz^ takes multiplications in GF(p). 

Proof. Since we can use either of the two bases Bi and B 2 at our convenience, 
all the items are straightforward. Thus we only prove item 4 in detail. Set t = 2m. 
First we represent x,y, z by using the normal basis Bi, 



t-i t-i t-i 

X = ^ Qia^ , y = ^ biaP ^ , 

2=0 2=0 2=0 



and we get 

m—1 t—1 

2=0 i—m 

Then we have 

t—1 m — 1 f— 1 f— 1 

xz — yz^ = ^ ^ ^ ^ bic''a^ 

i—0 j—0 2=0 j=m 



where c' = Cj + Cm+j and c" = Cj + Cj-m- Getting c',c" from c^’s is a free 
operation. Now we convert the basis into B 2 and then the computation for 
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aP^ qP ^ becomes free. Hence we need multiplications in GF(p) to compute 
xy — yzP . 

We note here that we haven’t used any high speed multiplication algorithms 
in the proof. When we apply more efficient multiplication algorithms, then the 
required bit complexity will be reduced. Comparing Lemma 2.1.1 in [2] and the 
above Lemma 5, we see that the number of bit operations for computing 

xy, xz — yzP 

are about the same in the following two cases when |P| = m\p\, 

— x,y,z € GF(P^) with P = 2 (mod 3) ; 

— x,y,z G GF(p^™) where 2m + 1 is a prime and (p) = ^ 2 m+i- 

Now based on Lemma 5, we have the following estimate for the computational 
complexity of Algorithm 2. 

Theorem 3. Let c G GF(p^'") and a positive integer n be given. Then it takes at 
most 11.2m^logn multiplications in GF(p) to compute S„{c) = (c„_i, c„, c„+i). 

Also we estimate the required bit complexity to compute Tr(p“g^^) from 
given Tr(p) and S'fc(Tr(p)) for unknown k. This computation is required for 
XTR-Nyberg-Rueppel signature scheme as in [2]. 

Theorem 4. Let g G GF(p®"‘) be an element of prime order q and suppose 
Tr{g) and Sk{Tr{g)) be given for some unknown positive integer k. Let a,b be 
positive integers with less than q. Then Tr{g°‘g^^) can be computed at a cost of 
(11.21og(a/6 (mod q) + 11.21og6 + 36)m^ multiplications in GF{p). 

We specify the steps as following : 

— Compute e = I (mod q) . 

— Compute Tr(g^+®). 

— Compute Tr^+G>>) = Tr{g^g^'^) 



We focus on the second item here. We have 

Sk{Tr{g))A{cr = [So{Tr{g))A{c)>^]A{cr = [5o(Tr(p))A(c)^]A(c)'= 

In order to get Pr(p^+®), what we need is [So{Tr{g))A{cy]C{A{c)^), where 
C(A(c)^) is the center column of the matrix A(c)^. Since we have already given 
Sk{Tr{g)), and we know that 

Sk{Tr{g)f= So{c) \c{A{cf), 

V syc) ) 
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where Sk{Tr{g))'^ is the transpose of Sk{Tr{g)). Hence we have the very same 
formula as in XTR, 



C{A{cr) = So{c) S,{Tr{g)f. 

V ^i(c) J 

This implies that computing C{A{c)^) takes constant time for any k when 
Sk{Tr{g)) is given. In fact, it takes at most 9 x 4m^ = 36m^ multiplications 
in GF(p). Hence the complexity to compute Tr(g^+®) can be estimated as 
(11.21oge + 36)m^ multiplications in GF(p). And the complexity to compute 
the third item is 11.21og6m^ multiplications in GF(p). Thus we can estimate 
the complexity to get Tr{g°'g^^) by (11.21og(a/& (mod q)) + 11.21ogfe + 36)m^ 
multiplications in GF(p). 

Thus we see that the computational complexity is about the same for the 
original XTR and our extension to But the multiprecision problem 

that occurs in the operations in GF(P) for large P can be removed when we use 
GF(p) with p as small as the word size of the processor. 

Now we pose another condition on m so that it is easy to generate the prime 
p. Since we are using ONB, p is necessarily a primitive element in Z 2 m+i- But as 
a matter of parameter generation, we will decide m first and choose the prime p 
somewhat randomly. So it is desirable that Z 2 m+i has more primitive elements. 

The number of primitive elements in Z 2 m+i is 4>{(j){2'm+ 1)) = 4>{2m), where 
</>(•) is the Euler totient function. Thus we want to make 4>{2m) as big as possible 
compared to m. There are two possible directions we can take to this end. The 
first is to take m to be a power of 2. But then 2m + 1 must be a Fermat prime, 
which is very rare. The other is to take m to be a prime. There are reasonably 
many choice of primes m for which 2m + 1 is also a prime. 

For appropriate selection of m for current recommendation is given in the 
Appendix A. After m and the sizes of p, q have been established, we generate p, q 
and c = Tr(p). Algorithms for generating parameters are also given in Appendix 
A. 

5 Conclusion 

Thus most of the details in XTR can be generalized systematically to the finite 
field GF(p®’”) using the trace projection Tr : GF(p®™) — > GF(p^™). Hence it is 
straightforward to see that the schemes as XTR-DH, XTR-ElGamal encryption, 
and XTR-Nyberg-Rueppel signatures can be extended to GF(p®’”). 

For security concerns, all the details were given in Section 5 of [2]. They’ve 
discussed about the DTP in GF(p‘), and hence it can be applied to cases in 
GF(p®™). The communicational and computational advantages of the XTR 
schemes can be obtained in the generalization as long as we choose m so that 
either 2m + 1 is a Fermat prime or both m, 2m + 1 are primes and we don’t 
have any multiprecision operations if we select the size of p as small as the word 
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size of common processors in the generalization but we might need longer time 
to generate prime numbers p and q in such cases. But the prime numbers are 
needed to generate only once, the generalized version of XTR is more preferable 
for limited applications as smart cards. 
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A An Efficient Way for Parameter Generation 

In this appendix, we describe an efficient way for parameter generation one by 
one. At first, we decide the size of the field GF(p®™) and which m to use. Then 
we select the primes p, q. And finally, we select c G GF(p^™) \ GF(p"‘). 

As we discussed in Section 4, m will be chosen so that either 2m -I- 1 is a 
Fermat prime or both m and 2m -I- 1 are primes. In the Table 1 below, we give a 
list of m which meet this criterion. The numbers in the column titled ‘ratio’ are 
the proportion of primitive elements of Z 2 m+i- The case m = 1 is the original 
XTR. Not much improvement is achieved in the cases m = 2, 3. Thus we do not 
recommend to use these cases. Note that the ratio tends to 1/2 as m gets larger. 

For given m as in the above table, we choose the appropriate size for the field 
characteristic p so that the size of p®'" is about the same as the recommended 
size for prime fields with respect to the security concerns in the DLP in prime 
fields. For example, see Table 2. 

Now we consider the generation of the prime numbers p, q in our scheme. We 
follow the scheme of generating prime numbers as in [1] rather than using the 
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Table 1. Choices for m and the corresponding extension fields 



m 


ratio 


field extensions 


1 


1/3 


CF(p) ^ CF(p2) ^ GF(p6) 


2 


2/5 


CF(p) ^ CF(p4) ^ GF(pi2) 


3 


2/7 


CF(p) ^ CF(p®) ^ GF(pi®) 


5 


4/11 


CF(p) ^ GF(pi°) ^ GF(p3°) 


8 


8/17 


CF(p) ^ GF(pi®) ^ GF(p^«) 


11 


10/23 


CF(p) ^ GF{p^^) GF(p®®) 


23 


22/47 


GF(p) ^ GF(p^®) ^ GF(pi®«) 


29 


28/59 


GF(p) ^ GF(p®«) ^ GF(pi’’^) 


41 


40/83 


GF(p) ^ GF(p«°) ^ GF(p2^°) 



Table 2. Choices for m and the corresponding size of a finite field 



field size 


1024 bit 


2048 bit 


2700 bit 


5100 bit 


recommended m for p of 16 bits 


11 


23 


29 


41 


recommended m for p of 32 bits 


8 


11 


23 


29 


recommended m for p of 64 bits 


5 


8 


11 


23 



method given in [2] . In general setting, we are interested in the cases using small 
or medium sized prime number p covered in [1] . 

Here we describe the method of generating prime numbers p and q that we 
need. First we determine the and jgl (the bit sizes of p®'" and q) for general 
security considerations according to [4]. And then we decide m so that p is of 
the word size of the processor to be used. 

We repeat selecting p until — p™ + 1 has a prime factor of size Ig]. We 
refer the result of [1] that points it works sufficiently quickly in practice. As 
m grows, hence p gets smaller, there are fewer appropriate p, m’s than XTR 
case (assuming comparable levels of security). The exact distribution of such 
primes p, q is not known until now. Since this is one-time cost, it’s not a serious 
disadvantage. 

Now consider the generation of c = Tr{g). The most elementary way to 
generate c = Tr{g) where g G of the order q with q divides p^™— p™-|-l. 



As usual, first we randomly generate h G GF{p°™) and check if h 






and set g = h i . Then such g has the order q. We compute Tr{g). But 
here we have lemmas which will come in handy when we construct a suitable 
generator p of a subgroup. And the proofs of these lemmas are similar to the 



XTR in GF(P®). 
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Lemma 6. Let m he a positive integer such that either 2m+l is a Fermat prime 
or both m and 2m + 1 are primes. Suppose F{c,X) is irreducible over GF(p^’”) 
and g G GF(p®'") is a root of F{c,X). Then we have c = Tr{g) and Cn = Tr(g”) 
and the multiplicative order q of g divides — p^ + 1. 

Lemma 7. F{c,X) is reducible over GF{p^'^) if and only if Cpm_^_i G GF(jf^) 

By using this lemma, we have a similar algorithm as in XTR to generate 
c = Tr(q) where q G GF(p^^) of prime order q that is not contained any 
subfield of GF{p^^). 

Algorithm to generate c = Tr{g) for our purpose 

1. Ghoose c G GF{p^^) \ GF{p^). 

2. Gheck if Cpm+i G GF{p'^). If it is, go to 1. 

3. Gompute C p2m _pm and check if it is 3. If it is not 3, then set c — C p2m . 

g g 

This c is what we wanted. 




The Two Faces of Lattices in Cryptology 



Phong Q. Nguyen 

Ecole Normale Superieure, Departement d’Informatique, 
45 me d’Ulm, 75005 Paris, France 
pnguyen@ens.fr and http://www.di.ens.fr/~pnguyen/ 



Abstract. Lattices are regular arrangements of points in n-dimensional 
space, whose study appeared in the 19th century in both number the- 
ory and crystallography. Since the appearance of the celebrated Lenstra- 
Lenstra-Lovasz lattice basis reduction algorithm twenty years ago, lat- 
tices have had surprising applications in cryptology. Until recently, the 
applications of lattices to cryptology were only negative, as lattices were 
used to break various cryptographic schemes. Paradoxically, several pos- 
itive cryptographic applications of lattices have emerged in the past five 
years: there now exist public-key cryptosystems based on the hardness of 
lattice problems, and lattices play a crucial role in a few security proofs. 
In this talk, we will try to survey the main examples of the two faces of 
lattices in cryptology. The full material of this talk appeared in [2]. A 
preliminary version can be found in [1]. 
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Abstract. We present a new message authentication code. It is based 
on a two trail construction, which underlies the unkeyed hash func- 
tion RIPEMD-160. It is in comparison with the MDx-MAC based on 
RIPEMD-160, much more efficient on short messages (that is on mes- 
sages of 512 or 1024 bits) and percentage-wise a little bit more efficient 
on long messages. Moreover, it handles key-changes very efficiently. This 
positive fact remains if we compare our Two-Track-MAC with HMAC 
based on RIPEMD-160. 



1 Introduction 

Message Authentication Codes (MACs) are symmetric-key cryptographic prim- 
itives used to provide data integrity and symmetric data origin authentication. 
Given a message M to be authenticated and a secret key K (shared between two 
parties), the MAC algorithm computes an authentication tag A = MAC{K, M) 
for the message. The pair (M, A) is passed from sender to receiver who can ver- 
ify the authentication tag by computing the MAC of the message himself (as he 
knows the key). 

The goal of an adversary (who does not know the key) is to forge a MAC for a 
message of his choice (selective forgery), or for an arbitrary message (existential 
forgery). Here it is assumed that the adversary has knowledge of a number of 
messages M* and their corresponding authentication tags A* = MAC(K, M^). 
In the case of a chosen-text attack the opponent is even able to request the MAC 
for a number of messages of his choice (before forging a MAC on a new, and 
different, message). 

* Algorithm invented while working at debis Information Security Services - Bonn, 
Germany. 

** The work described in this paper has been supported in part by the Commission of 
the European Communities through the 1ST Programme under Contract IST-1999- 
12324 and by the Concerted Research Action (GOA) Mefisto-666. 
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It is a common approach to construct MAC algorithms from existing crypto- 
graphic hash functions, as such schemes require little additional implementation 
effort. They are also generally faster than MACs which are based on block ci- 
phers. A cryptographic hash function is a function which compresses an input of 
arbitrary length into a hash value of fixed length, while also satisfying some ad- 
ditional cryptographic properties (preimage resistance and collision resistance). 
A hash function usually works by iteration of a compression function, which has 
a fixed-length message input operating on an internal state variable. The final 
value of this internal state then serves as hash value. 

To build a MAC algorithm from a hash function it is necessary to include 
a secondary input, the secret key, in the computation. Early proposals such as 
the envelope method [6], where the key material is simply prepended and ap- 
pended to the message input to the hash function, were shown to have significant 
weaknesses [4,5] . MDx-MAC and HMAC have emerged as the most secure alter- 
natives. MDx-MAC [4], which can be based on MD5, RIPEMD, SHA or similar 
algorithms, makes some small changes to the hash function used, while HMAC 

[1] is a black box construction that can be based on any hash function. 

In this paper we will present a new MAC algorithm, called Two-Track-MAC 
(or TTMAC in short). It has been submitted as a candidate algorithm for the 
NESSIE project [3]. The algorithm is based on the RIPEMD-160 hash function 

[2] (making only small changes to the hash function). We will show that the 
structure of RIPEMD, which consists of two parallel trails, has been exploited to 
double the size of the internal state, and that this allows to significantly reduce 
the overhead in the computation of the MAC for short messages, compared 
to the other MAC constructions. Another advantage of our proposal is better 
efficiency in the case of frequent key changes. These properties are very useful 
in applications, e.g., banking applications, where many short messages need to 
be authenticated (with frequent key changes) . Although there is no formal proof 
of security for our construction, based on the heuristic arguments presented in 
Section 3, we believe it is very unlikely that an attack can be found on Two- 
Track-MAC, which would not also breach the security of RIPEMD-160. 

The remainder of this paper is organized as follows. In Section 2 we present 
our new MAC. Section 3 discusses the security, and Section 4 the efficiency of our 
proposal. In Section 5 we suggest a more general construction method that can 
be used to construct new schemes. Section 6 concludes the paper. Pseudo-code 
for our algorithm is given in the Appendix. 



2 Presentation of Two-Track-MAC 

The unkeyed hash function RIPEMD-160 (for a description we refer to [2]) uses 
two trails in its compression function. If we separate those two trails then each 
trail can be seen as a transformation of a 160-bit input I, controlled by a message 
M, consisting of sixteen words of 32 bits. Those 160 bits of the input / (and of 
the output) consist of five words of 32 bits. Call the output of the different trails 
L{I, M) and R{I, M) (left respectively right trail output for an input I and a 
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message M), then our proposal for a MAC on a relative short message M (of 
512 bits) and a key K of 160 bits is (in short notation) 

R{K, M) -L{K,M). 

Or as R{K,M) can be viewed as five words Ai of 32 bits : (Aq, Ai, A 2 , A 3 , A 4 ), 
and similarly the value L{K, M) as (i?o, ^ 1 ,^ 2 , B 3 , B 4 ), we get an output E = 
{Eq, El, E 2 , E 3 , E 4 ) of five 32-bit words. Here 

Ei = Ai — Bi (subtraction modulo 2^^) for t = 0, 1, 2, 3,4. 

Then the 160-bit string E is the MAC of the 512-bit message M. Figure 1 gives 
a schematic view of this computation. 



K 




E 



Fig. 1. High level view of TTMAC for a message of a single block. 



If the message is longer, i.e. M = M 1 M 2 M 3 ■ ■ ■ Mn where each Mi is of length 
512 bits, we define, using a new operation L* and a new operation R* , the 160- 
bit quantity A, respectively the 160-bit quantity B. A = (Aq, Ai, A 2 , A 3 , A 4 ) = 
L*{K, Ml), where each Ai is a 32 bit word. And B = {Bq, Bi, B 2 , B^, B 4 ) = 
R*{K, Ml) as the result of the right trail. The operation L* is based on the 
operation L, which had a straightforward inverse operation on the first (160 
bits long) argument. This new operation L* has a simple feedback with the first 
argument, i.e. 

L*{I,M) = L{I,M) - I 

(this is five times a subtraction modulo 2^^) . Similarly the operation R* is defined 
in shorthand as 

R*{I,M) = R{I,M)~ I 

(this is five times a subtraction modulo 2^^). Now we introduce two 160-bit blocks 
C and D of five 32-bit words, C'=(C'o,Ci,C' 2 ,C' 3 ,C' 4 ) and D={Dq,Di,D 2 ,D 3 ,D 4 ), 
which are defined as follows: 



C 2 — A^ — Bq, 
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C 3 = Ai — Bi, 

C 4 = Aq — B 2 , 

Co = (^1 + A4) — Bo, 

Cl = A 2 — B 4 , 

Di = {A4 + A2) — Bo, 

D2 = Ao — Bi, 

Do = Ai — B 2 , 

D4 = A2 — Bo, 

Do = Ao~ B 4 . 

All subtractions and additions are modulo 2^^ . These 160-bit blocks C and D are 
the starting values for the left, respectively, right trail to incorporate the next 
512-bit message block Mo- If there are more message blocks Mi the iteration is 
the same. So we have 

HL{1) = A = L*{K,Mi), 

HR{1) = B = R*{K,Mi), 

and then iteratively (for i = 2 , n — 1 ) the three operations 

{A,B)^{C,D), 

HL{i) = A = L*{C,M,), 

HR{i) = B = R*{D,Mi). 

For the last message block M„ however, the role of the left and right trails is 
interchanged: 

{A,B)^{C,D), 

HL{n) = A = R*{C,Mn), 

HR{n) = B = L*{D,M„). 

Once we have HL{n) and HR{n) we define our MAC as TTMAC{K,M) by 
HL{n) — HR{n) (five times a subtraction modulo 2^^). In Figure 2 a schematic 
view of the computation is given for a message consisting of two blocks. 

The same preprocessing rules as in the RIPEMD-160 hash function are used 
to format the message input to the algorithm (the message is padded to a 
bitlength which is a multiple of 512). An additional output transformation can 
be used to reduce the length of the MAC result. This transformation calculates 
the necessary number of output words, in such a manner that all of the normal 
output words are used. Let the normal 160-bit result be if = {Eo, Ei,E 2 ,Eo, E 4 ), 
and denote the final (shortened) MAC result with E, consisting of t 32-bit words 
Ei (t = 1, 2, 3, or 4). For a 32-bit MAC we compute (using addition modulo 2^^) 



Fo — Eo El E2 Eo E4. 
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K 




E 



Fig. 2. High level view of TTMAC for a message of two blocks. 

For a MAC result of 64, 96 or 128 bits we compute respectively the first two, 
the first three or all four of the following values (all additions are modulo 2^^): 

Aq = Aq + -El + -E3, 

F\ = El + E2 + E4, 

E2 = E2 + E3 + Eq, 

E3 = E3 + E4 + El . 

3 Security of Our Proposal 

3.1 Philosophy on the Security 

The idea for the security is simple: Now we have an internal state variable 
(-ffE(i), HR{i)) of 320 bits. This is twice as long as for other MAC constructions 
(e.g., MDx-MAC and HMAC) based on RIPEMD-160 (or on SHA-1). Only in 
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the case of very weak transformations a cryptanalyst is allowed to hope on so- 
called internal collisions. In almost all attacks, which do not attack the very 
heart of the MAC (in our case the two trails of RIPEMD- 160), forgery is based 
on internal collisions. 

Another attack is possible if the MAC on a message contains all the informa- 
tion (or lacks “only” 32 bits of information) of the internal variable on a longer 
message, containing the first message as a prefix. In our case we have an internal 
state of 320 bits, so we can use 160 bits of information (the difference between 
the left and the right state variable, this depends on both trails) as the MAC 
output without compromising the internal state. Furthermore, we have used the 
idea of interchanging the left and right trails for the last message block as a free 
extra defence against such extension attacks. Note that other MAC construc- 
tions need to apply the compression function with some secret key material at 
the end of the computation in order to prevent these attacks. In our case, the 
secret key is only used as initial value for the two trails. 

So now the worry for the cryptographer are the two trails of RIPEMD- 160 
itself. A single trail has one important weakness: it is a bijective operation, 
where the attacker can choose the bijection, which is parameterized by the 512- 
bit quantity M^. But as long as two trails are used, parametrized by the same 
512-bit quantity Mi, and only a sum will come out in the open, there is no danger 
that an attacker can invert the operation. Moreover, we have used feedback to 
counter a straightforward inverse operation. (We do not use feedback on messages 
of 512 bits, because there the feedback from the left trail would cancel out the 
feedback from the right trail, in other words we do not need feedback there). 
All this makes the transformation of a new 512-bit message block on the 320-bit 
internal variable a one-way operation. 

Suppose a cryptanalist discovers a message N, such that the function 
from a 160-bit leftside argument to a 160-bit output, has only a few short cycles 
(and many relative short tails ending in those cycles) . Such a discovery is useless 
because we have chosen to mix the outputs of the two trails, as soon as the 
functions L* and R* have outputted their results. (Otherwise it might be possible 
for the cryptanalyst to generate collisions for the left trail by appending blocks 
to the message, and seperately trying to find collisions for the right trail.) This 
mixing is thorough in the following sense: Denote the outcome of the left trail 
by A, the outcome of the right trail by B (as we did before), denote the MAC, 
in case we are done (in case this was the last message block), with E. Now 
denote C and D as the starting values for the new trails (in case it was not the 
last block). Then each pair out of the five values A,B,C,D and E has (just) 
enough information to determine the other three values. This ensures all kinds 
of injectivity properties. 

The information theoretic uncertainty about the pair {C, D) can, of course, 
not be larger than the uncertainty about the key. That is 160 bits. But the goal 
of the design is that RIPEMD-160 is so “complicated”, that the “virtual” un- 
certainty about (C, D) is 320 bits. The construction is such that the information 
in any pair of the five tuple {A, B,C, D, E) is 320 bits. So if the uncertainty 
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about (C, D) is 320 bits then the uncertainty about for example the pair {A, E) 
is also 320 bits. This means that given all information about E, the uncertainty 
about A is still 160 bits. In other words the attacker has “virtually” no informa- 
tion about the starting value for the new left trail. Of course he has ’’virtual” 
information about the pair (A,B), but that is more “complicated” than having 
information about a single trail. 

3.2 Resistance against General Attacks 

The resistance of Two-Track-MAC against forgery attacks which are generally 
applicable, depends on the following parameters: the keylength k which is 160 
bits, the output length m which can be between 32 and 160 bits (in 32 bit steps), 
and the length I of the internal state which is 320 bits. 

A first possible approach for an adversary is trying all possible keys (once he 
recovers the key he is able to forge the MAC for any message he chooses). For a 
key length k and output length m, such an attack requires about 2^ trials and 
k/m known text-MAC pairs (for verification of the attack). 

Alternatively, the adversary can just guess the MAC corresponding to a cho- 
sen message. His success probability will be 1/2™ although this attack is not 
verifiable. The parameter m should be chosen long enough according to the 
needs of the application. 

The forgery attack based on internal collisions requires about 2*/^ known 
text-MAC pairs to find an internal collision (with a birthday attack), and 2*“™ 
chosen texts to distinguish the internal collision from the external ones (this is 
shown in [4]). Table 1 below summarizes the difficulty of these general attacks 
applied to Two-Track-MAC. 



Table 1. Resistance of Two-Track-MAC against general attacks. The output length 
m can take the following values: 32, 64, 96, 128 or 160 bits. 



attack 


trials success prob. known pairs chosen texts 


key search 
guessing the MAC 
internal collision 


2^®° 160/m 

1/2™ 

2^20 — m 



4 Short Comparison on the Efficiency of Two-Track-MAC 



Our MAC uses only a few percent more operations on a message as RIPEMD- 
160 would do to get an unkeyed hash of the message (about 97% of the speed 
of RIPEMD-160 is achieved). This is already the case for the shortest possible 
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message of 512 bits. In contrast, both MDx-MAC and HMAC require an extra 
computation of the underlying compression function (using a secret key) at the 
end of the MAC computation. So that is relatively costly on messages of just 
one (512-bit) block (less than 50% of the speed of unkeyed hashing is achieved). 

Also, since the secret key only serves as an initial value for the computation, a 
key-change will not slowdown the speed of the computation of Two-Track-MAC. 
In the case of HMAC or MDx-MAC a keychange costs respectively two or six 
extra computations with the underlying compression function. 

5 General Construction 

Our new MAC construction is not dependent on RIPEMD-160 alone. It just 
needs two operations L and R : (TlxT2) i— > Tl. The set T1 should be big enough 
to make collissions improbable, for example GF(2^®°). The size of the set T2 
should be chosen big if messages are expected to be long. The operations L and 
R are allowed to be invertible if the second argument is fixed, but the operations 
L* and R* (including feedback from the first input) should be infeasible to invert. 
The operations L and R might be bijective in the first argument, but they should 
behave unpredictable on changes in the second argument, if the first argument 
is unknown (but perhaps fixed). It would even be better if the change in the 
output of the function L, say, is unpredictable with known first argument, i.e., 
the only way to know the effect of a change is to compute the new function 
value. Based on the experience that a first version of RIPEMD was partially 
broken, it is recommended that L and R should be as different as possible. In 
the case that Tl contains all 160-bit strings, one can use the same transitions 
{A,B) (C,D) as we use for Two-Track-MAC based on RIPEMD-160. One of 

course needs to define also a padding rule as the message-length needs to be a 
multiple of some fixed quantity. With those transitions and a padding rule one 
can define the MAC on a message of any length. 

6 Conclusion 

We have presented a new message authentication code based on the two trail 
construction which underlies RIPEMD-160. The main advantage of the scheme 
is that it is more efficient than other schemes based on RIPEMD-160, especially 
in the case of short messages and frequent key-changes. We also suggested a 
more general construction method which can be used to construct new schemes. 

A Pseudo-code for Two-Track-MAC 

The TTMAC algorithm computes a 160-bit MAC value for an arbitrary message, 
under a 160-bit key. The result can be transformed into a shorter value by use 
of the output transformation (not reflected in the pseudo-code below). First we 
define all the constants and functions. 
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TTMAC: definitions 

nonlinear functions at hit level: exor, mux, mux, - 



f{j, x,y,z) = x®y® z 




(0 < J < 15) 


fU, X, y, z) = (a; A y) V A z) 


(16 < J < 31) 


fU,x,y,z) = (x V 0 


Z 


(32 < J < 47) 


fU, X, y, z) = (x A z) V (y A ^z) 


(48 < J < 63) 


fU,x,y,z) =x0(?/V^z) 
added constants (hexadecimal) 


(64 <j< 79) 


c{j) = OOOOOOOOx 


(0 < J < 15) 




c(j) = 5A827999x 


(16<j<31) 


[230 . y2j 


c(j) = 6ED9EBAlx 


(32 < j < 47) 


[230 . ygj 


c(j) = SFlBBCDCx 


(48 < J < 63) 


[230 . ygj 


c(j) = A953FD4Ex 


(64 <j< 79) 


[230 . 


c'(j) = 50A28BE6x 


(0 < J < 15) 


[230 . 


c'(j) = 5C4DD124x 


(16<j<31) 


[230 . 


c'(j) = 6D703EF3x 


(32 < J < 47) 


[230 . 


c'(j) = 7A6D76E9x 


(48 < j < 63) 


[230 . 


c'(j) = OOOOOOOOx 


(64 <j< 79) 





selection of message word 

r{j) = j (0 < J < 15) 



r(16..31) = 7,4,13,1,10,6,15,3,12,0,9,5,2,14,11,8 

r(32..47) = 3, 10, 14, 4, 9, 15, 8, 1, 2, 7, 0, 6, 13, 11, 5, 12 

r(48..63) = 1,9,11,10,0,8,12,4,13,3,7,15,14,5,6,2 

r(64..79) = 4, 0, 5, 9, 7, 12, 2, 10, 14, 1, 3, 8, 11, 6, 15, 13 

r'(0..15) = 5, 14, 7, 0, 9, 2, 11, 4, 13, 6, 15, 8, 1, 10, 3, 12 

r'(16..31) = 6, 11, 3, 7, 0, 13, 5, 10, 14, 15, 8, 12, 4, 9, 1, 2 

r'(32..47) = 15, 5, 1, 3, 7, 14, 6, 9, 11, 8, 12, 2, 10, 0, 4, 13 

r'(48..63) = 8, 6, 4, 1, 3, 11, 15, 0, 5, 12, 2, 13, 9, 7, 10, 14 

r'(64..79) = 12, 15, 10, 4, 1, 5, 8, 7, 6, 2, 13, 14, 0, 3, 9, 11 

amount for rotate left (ro\) 

s(0..15) = 11, 14, 15, 12, 5, 8, 7, 9, 11, 13, 14, 15, 6, 7, 9, 8 

s(16..31) = 7, 6, 8, 13, 11, 9, 7, 15, 7, 12, 15, 9, 11, 7, 13, 12 

s(32..47) = 11, 13, 6, 7, 14, 9, 13, 15, 14, 8, 13, 6, 5, 12, 7, 5 

s(48..63) = 11, 12, 14, 15, 14, 15, 9, 8, 9, 14, 5, 6, 8, 6, 5, 12 

s(64..79) = 9, 15, 5, 11, 6, 8, 13, 12, 5, 12, 13, 14, 11, 8, 5, 6 

s'(0..15) = 8, 9, 9, 11, 13, 15, 15, 5, 7, 7, 8, 11, 14, 14, 12, 6 

s'(16..31) = 9, 13, 15, 7, 12, 8, 9, 11, 7, 7, 12, 7, 6, 15, 13, 11 

s'(32..47) = 9, 7, 15, 11, 8, 6, 6, 14, 12, 13, 5, 14, 13, 13, 7, 5 

s'(48..63) = 15, 5, 8, 11, 14, 14, 6, 14, 6, 9, 12, 9, 12, 5, 15, 8 

s'(64..79) = 8, 5, 12, 9, 12, 5, 14, 6, 8, 13, 6, 5, 15, 13, 11, 11 
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It is assumed that the message after padding consists of n 16- word blocks that 
will be denoted with Mi[j], with 1 < i < n and 0 < j < 15. The key used and the 
MAC value obtained consist of five words each, respectively {Kq, Ki,K 2 , K^, K^) 
and (i?o, if 1, £12, £^3, ^14)- The symbols □ and B denote respectively addition 
and subtraction modulo 2^^; rob denotes cyclic left shift (rotate) over s positions. 
The pseudo-code for TTMAC is then given below. 



TTMAC: pseudo-code 

Co := £0; Cl := £1; C2 := £2; C3 := £3; C4 := £4; 

Do := Kq; Di := Ki] D2 := K2] Do := £ 3 ; D4 := K4] 
for i := 1 to n { 

Aq := Co; Ai := Ci; A2 := C2] A3 := C3; A4 := C4; 

Bq := £0; Bi := Di; B2 := D2', £3 := £3; £4 := £4; 

if {i \ = n) for j := 0 to 79 { 

T := rob(j) (Ao □ /(j, Ai, A2, A3) B M,[r(j)] S c(j)) B A4; 

Aq := A4; A4 := A3; A3 := rolio(A2); A2 := Ai; Ai := T; 

T := rob,(,) (£0 B /(79 - j, £1, £2, £3) B Mi[r'{j)] B c'{j)) B £4; 

£0 := £4; £4 := £3; £3 := rolio(£2); £2 := £1; £1 := T; 

} 

else for j := 0 to 79 { 

T := rob/Q) (Ao B /(79 - j, Ai, A2, A3) B M,[r'{j)] B B A4; 

Aq := A4; A4 := A3; A3 := rolio(A2); A2 := Ai; Ai := T; 

T := robo) (^0 B /(j, £1, £2, £3) B M,[r{j)] B c{j)) B £4; 

£0 := £4; £4 := £3; £3 := rolio(£2); £2 := £1; £1 := T; 

} 

Ao := Ao B Co; Ai := Ai B Cp A2 :=Aa B C2; A3 := A3 B C3; 

A4 != A4 B C4; 

Bo := £0 B Do; £1 := £1 B £1; £2 := £2 B £>2; £3 := £3 B £3; 

£4 := £4 B £4; 
if{i\ = n){ 

C2 := A3 B Bo; C3 := A4 B £1; C4 := Ao B £2; Co := (Ai B A4) B £3; 
Cl := A2 B £4; 

£1 := (A4 B A2) B Bo; £2 := Ao B £1; £3 := Ai B £2; 

£4 := A2 B £3; Do := A3 B £4; 

} 

} 

Eo '■= Aq B Bo; E\ := Ai B £1; £2 := A2 B £2; £3 A3 B £3; 

^ For a^s@ort"^message of up to 512 bits (one 16 -word block after padding), no 
feedback or mixing is required and the following simplified pseudo-code can be 
used. 
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TTMAC (one message block): pseudo-code 

Ao := Kq; Ai := Ki; A2 := K2; A3 := K3; A4 := K4; 

Bq := Kq; Bi := Ki] B 2 := K 2 ] B 3 := K 3 ] B 4 := K 4 ] 
for j := 0 to 79 { 

T ■= rol,,(j) {Ao 0 /(79 - j, Al, A 2 , A 3 ) 0 M[r'{j)] 0 c'(j)) 0 A 4 ; 
Aq := A 4 ; A 4 := A 3 ; A 3 := rolio(A 2 ); A 2 := Ai; Ai := T; 

T := rol,(^-) {Bo 0 /(j, Bi, B 2 , B 3 ) 0 M[r{j)] 0 c{j)) 0 B 4 ; 

Bo := B 4 ; B 4 := B 3 ] B 3 := rolio(S 2 ); ^2 := Bi, Bi := T; 

} 

Eq '■= Ao 0 Bo', El := Ai 0 Bi; E2 '■= A2 0 B2', E3 := A3 0 B3; 

E 4 := A 4 0 B 4 . 
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Abstract. We present data structnres for complement covering with 
intervals and their application for digital identity revocation. We give 
lower bounds showing the structures to be nearly optimal. Our method 
improves upon the schemes proposed by S. Micali [5,6] and Aiello, Lodha, 
Ostrovsky [1] by reducing the communication between a Certihcate Au- 
thority and public directories while keeping the number of tokens per 
user in the public key certificate small. 



1 Introduction 

Digital identities play an essential role in many cryptographic applications. In- 
frastructures for digital identities are built by means of public-key cryptography 
and Certification Authorities. The schemes differ in how digital identities can 
checked to be valid and how the identities can be revoked. 

A digital identity is validated by a certificate issued by a Certification Au- 
thority (CA). The CA initially uses a public key generation process to create a 
public key/secret key pair. The public key together with a fingerprint is pub- 
lished. A user u who wants to establish his own digital identity creates a new 
public key/secret key pair and sends the public key together with identifying in- 
formation to the CA. The Certificate Authority checks u’s identity to ensure that 
the user is really the person he/she claims to be. After that, the CA signs with 
its secret key a certificate containing u’s public key, the identifying information 
and an expiration date of this certification. Hence, anyone is able to check the 
certificate issued by the CA with the CA’s public key. For accepting u’s public 
key, one must not trust the user u himself but the CA. To establish higher levels 
of trust, one can use a hierarchy of CAs. 

A digital identity is valid as long as its certificate has not expired. In contrast 
to this, we must also have a mean for revoking users. Assume u’s identity is 
stolen or compromised before the certificate expiration date. The thief can sign 
arbitrary messages with u’s secret key. Hence, as in the case of credit cards, one 
must establish an immediate identity revocation. 

There are many solutions proposed in literature how to revoke digital iden- 
tities. The first one is a centralized online solution where a trusted database 
holds the status of each public key. The database answers queries about public 
keys. However, these answers must be authenticated by the database to avoid 
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man-in-the-middle attacks. In many cases the method is impractical because an 
online access is required. 

Another solution, the Certificate Revocation List (CRL), is widely used in 
practice. In this offline approach, the CA makes a list of all users revoked thus 
far and signs it. This list is distributed at regular intervals - for example during a 
daily update period - to many public directories. A public directory is untrusted 
but one insists that it cannot cheat and must return a user’s revocation status 
when queried. The main drawback of this scheme is the time it takes to check a 
key’s validity. One must first check the CA signature and then look at the whole 
list of revoked users. Consider a fixed update time and let r be the total number 
of revoked users up to this point. The CA has to communicate a CRL of size 
0{r) to each public directory in order to update the status of the keys. The time 
to check a key’s validity using the CRL is also 0{r). 

There are two other offline schemes proposed by Kocher [3] and Naor, Nis- 
sim [7]. They make use of authenticated hash trees. For a fixed update time and 
r as defined above the communication from the CA to the public directories is 
reduced to O(logr). In order to check a user’s identity, one receives O(logr) hash 
values from the directory and computes another O(logr) hash values. These val- 
ues are compared with the public directory data in a specified way. Additionally, 
the root signature of the authenticated hash tree is checked. 

The main drawbacks of the offline solutions mentioned so far are: 

— The information send by the CA must be authenticated. Therefore, signing 
the data is necessary. In order to prove the status, the signature must be 
checked. 

— The proof length - the amount of data that has to be checked for validation 
- is a function of r. Since normally one must prove a key’s validity very 
often, the proof length is the main bottleneck of digital identity revocation. 

S. Micali [5,6] proposed an elegant solution for these two problems based 
on an idea of offline/online signatures [2], which in turn builds on a work of 
Lamport [4]. He suggests to add an additional number y - called the user’s 

0- token ~ into the certificate. In order to create the 0-token, the CA picks a 

random number x and a one-way hash function / and computes y = := 

/(/(• • ■ that is, the function / is applied I times to x in order to compute 

the 0-token y. The parameter I corresponds to the number of update periods, e.g. 
the days till expiration. On day 1, if user u is not revoked, the CA publishes the 

1- token /h“i)(a;) of u. Since / is a one-way function, y can easily be computed 
from f^^~^\x) by applying / once, but it is infeasible to find a valid 1-token 
X with f{x) = y. Hence, u can take the 1-token as a proof that his key is 
valid on day 1. In general, the CA publishes the f-token /*-*“*^ on day i. This 
t-token serves as a day-f proof for the validity of u's key. Applying / i times and 
comparing the result with the 0-token proves the key’s validity. In the sequel, 
we will use the terms token and proof synonymously. Notice that in contrast to 
the schemes of Kocher [3] and Naor, Nissim [7] this scheme needs only one proof 
for key validation and no signature of the CA in the daily update period. 
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Let U = {1, 2, . . . , n} be the set of users and let 2^ denote the power set of 
U. For a fixed update time let R C U be the set of all users revoked so far. We 
set r = |i?|. The complement R = U — R ot R is the set of non-re voked users. 
The problem with Micali’s scheme is that each of the n — r non-revoked users in 
R obtains his own proof during an update period. Hence in each update period 
the CA has to communicate n — r tokens to a public directory. We denote this 
as the CA-to-directory communication. 

ALO (Aiello, Lodha, Ostrovsky) [1] proposed two schemes that reduce the 
CA-to-directory communication. These schemes are called Hierarchical and Gen- 
eralized Scheme. The main building block of the ALO schemes is a set F C 2^. 
The set F has the property that each set R of non-revoked users can be written 
as the union of the elements in a subset S{R) of F. Each element Sj G F has 
its own 0-token. For each set Sj G F, each user u G Sj stores the 0-token of Sj 
in his certificate. That is, the certificate of user u contains |{5'j € F : u G Fj}] 
different tokens. We denote the maximal number max„gj/ \{S, gF-.uG F,}| of 
tokens per certificate by T. 

In order to issue day-i proofs for the non-revoked users u G R, the CA 
computes a cover S{R) = Sj^, ■ ■ ■ , Sj^}, = R of the set R. 

Next, it publishes the m Ftokens of the sets Sj^,Sj^, . . . ,Sj^. Since these sets 
cover the set R, each non-revoked user u is contained in at least one set Sj. 
Recall that u stores the 0-token of Sj in his certificate. Hence, the i-token of the 
set Sj is a day-i proof for user u. 

There may be different ways to cover R by elements Sj G F. In ALO’s 
schemes, the CA always takes the minimal number of subsets for the cover in 
order to minimize the number m of proofs. Let 

m^ {m : CA needs m sets to cover R} 

RCU:\R\=n-r 

be the maximal number of proofs that the CA has to publish for a set R of size 
n — r. We denote this maximal number of proofs by V. 

Note that Micali’s revocation scheme fits this description. To obtain Micali’s 
revocation scheme, define F as F = {{!}, {2}, . . . {n}}. Hence, the users only 
have to store one 0-token in their certificate. 

Let us define three demands on our key revocation scenario in order of de- 
creasing priority: 

Proof of key’s validity: In our scenario, a user must prove a key’s validity 
very often. Therefore, we insist on only one proof for key validation as in the 
revocation schemes of Micali and ALO. The schemes of Kocher and Naor, 
Nissim do not meet this requirement. 

CA-to-directory communication (V): The CA-to-directory communication 
corresponds to the maximal number of proofs the CA has to send to a public 
directory. The maximal number of proofs is denoted by V. We have to keep 
the CA-to-directory communication small to allow frequent update periods. 
Thus, we want to minimize V. 

Tokens per certificate (T): We denote the number of tokens per certificate 
by T. To make the scheme practical (especially for smart card applications) , 
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T must be kept small, since checking long certificates is inefficient. But as- 
suming that checked keys are stored, certificates normally have to be checked 

only once. 

As mentioned above, Micali’s scheme has T = 1 token per certificate. How- 
ever, the CA-to-directory communication is 7^ = n — r if r is the number of 
revoked users. ALO’s Hierarchical Scheme improves upon Micali’s scheme by re- 
ducing the CA-to-directory communication V to rlog 2 (n/r) while increasing the 
number of 0-tokens T per certificate to log 2 n. The Generalized Scheme of ALO 
needs at most r(logg(n/r)-|-l) proofs per update period and T < — 1) log^ n 

tokens per certificate. Due to the — 1) factor, this scheme is only practical 
for c = 2 or c = 3, otherwise the certificates become too large. 

Our results build on the work of ALO. We propose a new method for covering 
the set R of non-revoked users by intervals. Therefore, we define a new class for 
covering problems called interval cover family (IGF). Our ICFs are constructed 
using interval trees. 

The set R of revoked users partitions U in subintervals of non-revoked users, 
which can be represented by the nodes of an interval tree. Our task is to find a 
scheme which covers any interval with sets of an IGF. Furthermore, we want that 
each user m G C/ is in a small number of sets. This property is important because 
as in ALO’s schemes, user u must include in his certificate all the 0-tokens of 
sets that contain u. 

Micali’s [5,6] and ALO’s Hierarchical scheme [1] also belong to the class of 
algorithms using interval cover families. The IGF in [5,6] is the simplest one. It 
covers intervals by single elements. Thus, the length of the covering intervals is 
always 1. In ALO’s Hierarchical scheme, the set of non-revoked users is covered 
by intervals with interval lengths that are powers of 2. In this paper, we propose 
two new methods for covering intervals that might be interesting for other areas 
of covering problems as well. 

In Section 2, we introduce the class IGF of interval cover families. Revoca- 
tion Scheme 1 (RSI) is presented in Section 3. It is a generalization of ALO’s 
Hierarchical scheme. The length of the covering intervals is a power of c > 2. For 
RSI, we obtain the upper bounds V < (r-l- l)(21og^ n— 1) and T < log^ n 

for some constant parameter c. 

Our second Revocation Scheme (RS2) presented in Section 4 leads to a GA- 
to-directory communication of 7^ < (r-|- l)(log^ n-l- 1), while keeping the number 
of 0-tokens per certificate upper bounded by T < log^ n(l -I- o(l)). 

Since the new bounds for T are polynomial in c our systems are practical 
for larger parameters c than ALO’s schemes. Thus, we can reduce the GA-to- 
directory communication V by choosing a large c. Since this communication is 
done during each update period, the system becomes more efficient. 

In Section 5, we study the relations of the class IGF to the task of key 
revocation. Using a more refined analysis, we show that RSI has a maximal GA- 
to-directory communication of 7^ < 2r(log^n — [log^rj). Assuming the revoked 
users to be uniformly distributed, we can further reduce the bound of RS2 to an 
expected upper bound of 7^ < (r -|- 1 — *~*'’'~^^ ))(loge + !)• 
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Table 1. Comparison of our schemes with Micali’s and ALO’s schemes 



Scheme 


V proofs from CA to directories 


T tokens per certificate 


Micali 


n — r 


1 


ALO’s Hierarchical 


Cogfif) 


logjU 


ALO’s Generalized 


+ 1) 


(2^-1 -1) log, n 


RSI 


2r(log„n - Llog^rJ) 


i^log,n 


RS2 


(r-f l)(log„n-t 1) 


'^2^ logc«(l + o(l)) 



If we only want to minimize the CA-to-directory communication, an optimal 
solution can be obtained from Yao’s range query data structures [8]. However, 
Yao’s construction results in prohibitively many tokens per certificate. For the 
first time in this area, we also prove lower bounds for the number T of 0-tokens 
(see Section 6). For example. Corollary 22 provides a lower bound of T > (- — 
1) • log,, n, where e is the Euler number. This shows that if c is constant, the 
trade-off in our revocation schemes between CA-to-directory communication V 
and the number of 0-tokens T is optimal up to a constant. 

2 Definitions 

Consider the universe U = {1,2, ...,n} of users with personal identification 
numbers 1 to n. Let 2^ denote the power set of U. Let i? C [/ be the subset 
of revoked users, R = U — R the complement of R. In our schemes, the CA 
has to find a family of sets that covers the subset R of all non-revoked users. 
Then the CA issues the t-tokens for all the sets in the cover. A day-i proof for 
a non-revoked u is a set that contains u. 

Definition 1 (interval set) The interval set V = [a,b], V C U is defined as 
[a, b] := {x G 1N\ a < x < b} . The interval set [a, a] is briefly written as [a] . The 
length of an interval set [a, b] is defined as b — a + 1. 



Definition 2 (interval cover) We call a family of subsets S C 2^ an interval 
cover (IC) of the interval set I iff Uves V = I and all subsets V are interval 
sets. If [S'! < k, S is called a k-IC. 



Definition 3 (interval cover family) F C 2^ is an interval cover family 
(ICF) of U iff for every interval set I U , there is a subset S of F such 
that S is an IC of I. F is a k-ICF of U iff there is at least one k-IC S C F of 
I for every I CU. 
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Lemma 4 Assume we have a k-ICF F of the universe U = {1 , . . . ,n} and an 
arbitrary R C U with |i?| = r. Then F covers R with at most (r + l)fc interval 
sets. 

Proof: Notice that a subset R of size |i?| = r partitions the interval [l,n] in 
at most r + 1 subintervals Ri LI ■■■ LI Rr+i = R- Thus, it suffices to cover these 
subintervals for covering R. Since each Ri is coverable by F with at most k 
interval sets, the claim follows. □ 



The number of interval sets needed to cover a set R in Lemma 4 corresponds 
to the maximal number of proofs ~ denoted Vf ~ the CA must send to the 
public directories during an update period. Hence, for a fc-ICF F the size of Vf 
is always upper-bounded by {r + l)k. 

It is also important for the practicality of a revocation scheme that the size 
of F is polynomial in n and that for every subset R of non-revoked users the 
corresponding ICs can be computed in time polynomial in log(n). 

For an ICF F and every u G U, we define hF{u) as the multiplicity of u 
in F', that is the number of sets in F containing the element u. Because every 
set in F that contains u can be part of an interval cover, user m’s certificate 
must include all hF{u) 0-tokens that contain u. Thus, the maximal number of 
0-tokens, denoted Tp := max„{/ip(M)}, corresponds to the maximal length of a 
user’s public key certificate. This length should be polynomial in log(n). There 
is a trade-off between the number of proofs Vf the CA must send to a public 
directory and the number of 0-tokens Tp in a revocation scheme. For instance 
in the Micali scheme [5,6], we have V = n — r and T = 1. In their Generalized 
scheme, ALO [1] had V < r(log^{n/r) + 1) and T < — l)log,,n. 

In the next section, we present a { 2 k — I)-ICF F with Tf = 0 {kn^^^) for 
some system parameter k. Taking k = log^n leads to T = O(c^log^n). 

3 A Revocation Scheme Using ICFs 

First, we introduce a notation on intervals. 

Definition 5 (combinational sum) The combinational sum of an interval 
[ai,6i] with an interval [02,^2] is defined as the interval [min {01,02}, 
max{&i,62}]. We also say, we combine interval [oi,6i] with [02,62]. Let W = 
|[oi, 61], [02, 62], . . . , [om, 6m] } be a set of disjoint intervals. We define the maxi- 
mal combinational sum ofW that is contained in an interval [o, 6] os the interval 
[mlua^ : ai > o,maxf,^ : bj < 6]. 

Note that the combinational sum of two intervals [oi, 61] and [02, 62] may contain 
elements which are neither in [oi,6i] nor in [02,62]. 

Next, we define an interval tree T for the interval [l,n] and a parameter k, 
that might depend on n. The construction is recursive. 



Construction of the interval tree T 
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— The root is labelled with the interval [l,n]. 

— Each node labelled with an interval [a, b] of length greater than 1 has 
children. The children partition the interval [o, b] into equally long pieces. 
That is, the children are roots of the interval trees for the intervals [a + i ■ 

, a + (z + 1) • — 1], 0 < z < (for simplicity, we assume that 

rz^/^ is an integer to avoid rounding). 

We store the following contents in each node of the interval tree. 

— Each node stores the interval of its label. 

— Moreover, each node stores the combinational sums of its label with the 
labels of its right siblings. 

Let combinational sums of nodes be defined as the combinational sums of their 
labels. 




Fig. 1. The interval tree T for zz = 3®, k = 3 



Example: In Figure 1, the node with label [10, 12] stores the interval sets [10, 12], 
[10, 15] and [10, 18]. Its father [10, 18] stores the intervals [10, 18] and [10, 27]. 

Since in level z the nodes are labelled with intervals of length ^ in 

level k we have interval length 1 and the recursive construction stops. Thus, the 
interval tree has depth k. 

We define the ICE F as the union of all the sets of intervals stored in the 
nodes of the interval tree T. However, we exclude the root label interval [l,zz]. 

Next, we want to show that E is a {2k — 1)-ICF, that is, we want to show 
that we can cover every interval set I C [l,zz] by at most 2k — 1 sets in F. In 
order to prove this, we present an algorithm that needs a maximum of 2fc — 1 
combinational sums for covering any interval I. 
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Algorithm Cover Scheme 1 (CSl) 

INPUT: interval I = [a,b] 

FOR level i = 1 TO k in the interval tree T DO 

Take the maximal combinational sums of the label intervals in 
level i of T that is contained in the yet uncovered parts of [a, b]. 

IF [a, b] is covered completely, EXIT. 

OUTPUT : interval sets Ii, I 2 , ■ ■ ■ , Im with ljj=i m m <2k — 1. 

Example: In Figure I on input I = [2, 21], the Algorithm CSl covers / by taking 
the intervals [10,18] (level 1, stored in node [10,18]), [4,9] and [19,21] (level 2, 
stored in nodes [4,6] and [19,21]) and [2,3] (level 3, stored in [2]). 

Lemma 6 The union F of all intervals stored in the interval tree T is a (2fc— 1)- 
ICF. 

Proof: We have to show that CSl needs at most 2fc — 1 interval sets to cover 
[a, b]. Notice that CSl covers the whole interval [a, b] successively from the middle 
to the borders. In level 1 of the interval tree T, one gets at most one combina- 
tional sum [ai,6i]. The uncovered parts [a,Oi — 1] and [61 -I- 1,6] both yield at 
most one additional interval in level 2. This holds because the maximal combi- 
national sums in level 2 are always of the form [g 2 , ai — 1] respectively [61 -I- 1 , 62] . 
Analogously, we get at most two additional intervals in the subsequent levels. 
This leads to the upper bound of 2fc — 1. EH 

We define the memory requirement |E| of an ICF F to be the number of interval 
sets in F. The running time of a /c-ICF F on input I = [a, 6] is the time to find a 
fc-IC S for I. Further, we define the running time of a fc-ICF to be the maximal 
running time taken over all choices of input intervals I. The following lemma 
shows that our {2k — 1)-ICF F can be efficiently implemented. 

Lemma 7 The ICF F has memory requirement 0(n^+^/^) and running time 
0{kn^/^). 

Proof: In every set of siblings at most * = 0{n^/^) intervals are stored. 

This follows from the fact that each node contains its label and all combinational 
sums with its right siblings. Hence, F contains at most 0{r?l^') = 

Q(j^H-i/fe) interval sets. 

The operations in each level can easily be implemented to run in time 0(n^/^, 
that is in the number of children. Thus, the total running time is 0{kn^/^). 0 



Definition 8 (RSI) Revocation Scheme 1 (RSI) uses the {2k— 1)-ICF F and 
Algorithm CSl in order to cover all interval sets R = U — R of non-revoked 



users. 
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Theorem 9 RSI is a revocation scheme with Vf < (f + l)(2/c — 1) and Tp < 

Proof: The number of proofs Vf < (?' + 1)(2A: — 1) follows from Lemma 4. 

It remains to show the upper bound for 7p. Because the node labels in each 
level partition the interval every element u G U is stored in exactly one 

label per level. We want to determine the number of interval sets of a single level 
in which a user u is contained. Therefore, consider the node with the label inter- 
val containing u. Combinational sums are only taken among the siblings of 
this node. When enumerating the siblings from left to right, it is easy to see that 
the i*^ sibling is in exactly i ■ i -I- 1) interval sets. This function in i takes 

its maximum for i = (n^!^ + l)/2, leading to maxj{t • — 1-|- 1)} = ( " 

Hence, user u can be in at most ( " '' 2 '*'^ )^ intervals sets per level. Summing over 
the k levels, we obtain Tf < \k{n^/^ + 1)^. 0 



Corollary 10 Choosing k = log^n for some constant c, we obtain a {2log^n — 
1)-ICF F that needs 0{n) memory and Oflog^n) running time. RSI is a key 
revocation scheme with Vf < (?' + 1)(2 log^ n — 1) and Tf < log^ n. 

Note that our result improves upon the generalized scheme of ALO, who had 
T < (2'^“^ — 1) logg n. A refined analysis of Vf is given in Section 5. 

Even with refined analysis, there remain two problems with the {2k — 1)- 
ICF presented above. First, we always assume an upper bound of 2fc — 1 for the 
number of intervals taken by algorithm CSl. Consider a small interval [a, b] with 
length much shorter than n. Algorithm CSl will take its first combinational sum 
in a level i that is close to the leaves in level k. It is easy to see that in this 
case, CSl outputs at most 2{k — f) -I- 1 interval sets. Therefore, Lemma 6 gives a 
pessimistic bound. Second, after using the first combinational sum in level i, we 
just need combinational sums of the rightmost or leftmost sibling nodes in the 
subsequent levels. But we store combinational sums of all sibling nodes. In the 
next section, we show how to avoid these problems. 

4 Another Revocation Scheme Based on ICFs 

We take an interval tree T' similar to the interval tree T in Section 3. The nodes 
and their labels remain the same as in T, only their content is changed. 

Definition 11 (partial sums) For each set of sibling nodes in a tree, we call 
the combinational sums of the leftmost sibling v with all other siblings to the right 
the right partial sums. The combinational sums of v’s father’s leftmost sibling 
with V and all of v ’s siblings are called the upper right partial sums (for an 
example, see below). The combinational sums of the rightmost sibling w with all 
other siblings to the left - except the leftmost sibling - are called the left partial 
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sums. Analogously, the combinational sums of w’s father’s rightmost sibling with 
w and all of w’s siblings are called the upper left partial sums. 

Let W = {[oi, 6 i], [o 2 , 62 ], • ■ • 5 [a?Ti 7 ^m]} be a set of partial sums. We define 
the maximal partial sum ofW that is contained in an interval [a, b] as the interval 
[miuaj : Oi > a,maxf,^ : bj < b]. 

Example: In Figure 1, the right partial sums of the set {[13], [14], [15]} of sibling 
nodes are [13], [13,14] and [13,15]. The left partial sums are [15] and [14,15]. 
The upper right partial sums are the interval sets [10, 13], [10, 14], [10, 15] and 
the upper left partial sums are [15, 18], [14, 18] and [13, 18]. 

Notice that we should omit those upper right partial sums where the father 
of the leftmost sibling v is itself a leftmost sibling, since these combinational 
sums yield always the label of the father. This holds analogous for the upper left 
partial sums. 



Node contents of the interval tree T' 

— Each node stores its label. 

— For any sibling nodes in level k — 2 j , 0 < j < the leftmost sibling v 
stores the right partial sums. Additionally, v stores the upper right partial 
sums. 

— For any sibling nodes in level k — 2j, 0 < j < the rightmost sibling w 
stores the left partial sums. In addition, w stores the upper left partial sums. 

— Any sibling nodes in level i of the recursion tree are divided in i equally large 
parts. In any part, each node stores the combinational sums of its label with 
the labels of its right siblings in this part. 

Again, the ICF F' is defined as the union of all intervals stored in the nodes. 

Algorithm Cover Scheme 2 (CS2) 

INPUT: interval I = [a,b] 
level z := 1 

UNTIL a combinational sum is taken DO 

Take the maximal combinational sum of the label intervals 
in level i of T' that is contained in [a,b\. (That combinational 
sum may consist of up to i intervals.) z := z + 1 
FOR level j = i+{k-i mod 2) TO k STEP 2 DO 

Take the maximal partial sums of the yet uncovered parts of [a,b]. 

IF [a, b] is covered completely, EXIT. 

OUTPUT : interval sets Ii, I 2 , ■ ■ ■ ,Im with Uj=i m ~ [®> m <k + l. 

Example: We cover the interval [2,21] using Algorithm CS2 and the interval 
tree T' of Figure 1. CS2 outputs the combinational sum [10, 18] in level 1 and 
the upper partial sums [2,9] and [19,21] in level 3. 
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Lemma 12 The union F' of all intervals stored in the nodes of the interval tree 
T is a{k + l)-ICF. 

Proof: Let Algorithm CS2 take a combinational sum in level i. Since in level 
i, maximal combinational sums of label intervals can be divided into i parts, we 
take at most i intervals. In the remaining k — i levels, CS2 can take at most 2 
intervals in each of the levels k — 2j, 0 < j < These are at most 
levels. Thus, we obtain the upper bound i + 2- \ <i + 2- = fc + 1. □ 



Lemma 13 The (k+l)-ICF F' needs memory and 0(kn^/^) running 

time. 

Proof: The memory requirement of F' is the amount of partial sums and 

combinational sums. We have at most 2n^/^ right and left partial sums per set 
of siblings. The upper partial sums sum up to another 2nf^^ intervals. Ignoring 
that these intervals are only taken in each second level we get an upper bound 
of (^i/fe)i _ fQj. partial sums. Further, each set of siblings 

stores combinational sums. Summing over the levels gives an upper 

bound of which is also an upper bound for 

the total memory requirement. 

Since the operations in each level can be implemented to run in time 0(n^/*), 
the running time is 0{kn^^^). E3 



Definition 14 (RS2) The Revocation Scheme 2 (RS2) uses the (fc+l)-/CFF' 
and Algorithm CS2 in order to cover all interval sets R = U — R of non-revoked 
users. 

Theorem 15 The {k + 1)-ICF F' yields a revocation scheme with Vp' < (?" + 
l)(fc + 1) and + | • (2ni/'= + 1) + i(log k + 

Proof: The upper bound for the number of proofs Vf' follows from Lemma 4. 

To complete the proof, we must show that Tp/ < + | • (2n^/^ + 1) + 

i(log k + l)n^/*. Let us start with the partial sums and consider a set of sibling 
nodes as enumerated from left to right. It is easy to see that the sibling is 
in — z + 1 right partial sums and in z — 1 left partial sums. This gives a 
total of partial sums for each element u G U. Analogously, one can show 
that each element u is in zz^/^ upper partial sums. Since each upper partial sum 
consists of rz^/^ combinational sums, we get another intervals. Thus, we 
have a total of + nfl^ partial sums for every element u & U va. the levels 
k — 2j, 0 < j < Summing over these levels gives us an upper bound of 

\ < I • (zz^/^ + ZZ^/''). 

In addition to the partial sums, we divide all sibling nodes in level z of T' in 

1/k 

parts of size and compute the combinational sums with their right siblings 
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in that part. Analogous to the proof of Theorem 9 every element in level i is in 
no more than — h 1)^ of these intervals. Summing over the levels gives 






7 !/^ . 



= z(E^+E — + Ei 

\i—l i—1 2=1 / 



< 



4 

\i 

1 

IT 



•n^/'= + 2(logfc+l)-ni/'= + fc^ 



Together with the partial sums computed before we get the desired upper bound 
for the number of 0-tokens 

Tf' < ^^•n2/'= + ^-(2ni/'= + l) + i(logA:+l)ni/^ 



0 



Corollary 16 Taking k = log^n, RS2 is a revocation scheme with Vf> < (?" + 
l){logcn + 1) and Tf' < • log^ n(l -I- o(l)). 

If we compare this result with the revocation scheme RSI of Section 3, we 
roughly halve the number of proofs V by doubling T. Since we have update 
periods frequently, it is preferable to make V small by slightly enlarging T. 

5 ICFs and Key Revocation 

In the previous sections, we studied the covering of arbitrary intervals [a, b] by 
ICFs and connected this to the task of key revocation by Lemma 4. But Lemma 4 
yields a pessimistic bound: 

— We always expect that r revoked users yield r -I- 1 intervals of non-revoked 
users. This is no longer true if r becomes large. 

— The intervals representing non-revoked users are not arbitrary but disjoint, 
that is the intervals do not overlap. Further, the average length of the inter- 
vals that have to be covered depends on the parameter r. 

5.1 The Expected Number of Intervals 

In the following, we assume that the revoked users R are uniformly distributed 
over the interval [l,n] and look for the expected number of intervals of non- 
revoked users. Let ii, 12 , ..., ir be the revoked users in sorted order, that is 
ii < i 2 < • • • < ir- We call ij and ij+i a pair iff ij^i = ij + 1. Note that pairs of 
revoked users do not introduce a new interval that must be covered, since they 
enclose an interval of non-revoked users of size 0. Let X be the random variable 
for the number of intervals. Then 

Er{X) < r -I- 1 — Efl:(number of pairs). 
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We obtain an upper bound since revoked users at the interval borders 1 and n 
never yield an additional interval. Thus, the borders 1 and n always pair. 

The expected number of pairs is 



if /{(number of pairs) 




1 ) 



r(r — 1) 
n 



We summarize this in the following lemma. 

Lemma 17 Let the revoked users he distributed uniformly over [l,n] and let F 
he a k-ICF. Then F yields a key revocation system with an expected upper hound 
ofVp < (?" + 1 — for the C A -to -directory communication. 



5.2 Key Revocation with RSI for Growing r 

Lemmas 4 and 17 still give pessimistic bounds, since they assume that arbitrary 
intervals are covered. But CSl does not always use 2k — 1 intervals to cover an 
interval [a, b]. If the interval length of [a, b] is small. Algorithm CSl will not use 
any intervals in the upper levels of the interval tree T. However, if the number 
r of revoked users increases then the average interval lengths of intervals that 
must be covered decreases. Hence, we expect some amortization of costs with 
growing r. 

This fact was studied in ALO [1]. They proved an upper bound of T < 
rlog 2 (y). Note, that the logarithmic term decreases with increasing r. We show 
our algorithm to be a generalization of [1] by showing a bound of 2r(log^ n — 
[logg r J ) — m for arbitrary c > 2 and m = r — cL^°s<= ’’J . Therefore, we adapt the 
proof techniques of [1]. For c = 2, our scheme reduces to the Hierarchical Scheme 
proposed in ALO[l]. 

Definition 18 Let P{n,R) be the number of proofs using the revocation scheme 
RSI for covering U — R, where U = {1,2, ...,n}. We define P(ji,r) = 
max/{,|/{|=r{-P(’T^j A)} to be the worst case number of proofs for a revocation set 
R of r users. 

Assume n = c^. 

Theorem 19 For r = c’’ , I > 0, RSI has P{n, r) < 2rlog^({() for c > 2, c G IN. 

Proof: Similar to the proof in ALO [1]. The proof is given in the full version 
of the paper. □ 

In the following theorem, we prove the upper bound for P(n, r) for arbitrary r. 
Theorem 20 For r = -\- m, RSI yields P{n, r) < 2r(log,, n — [log^ rj) — m. 

Proof: The proof is given in the full version of the paper. 0 




338 



Johannes Blomer and Alexander May 



6 Lower Bounds for 'Tp in a fc-ICF F 



In this section, we show lower bounds for the number of tokens Tp. Let U = 
{1, 2, . . . , n} be the set of users. We cover arbitrary interval sets [a, 6] C [7 of non- 
revoked users by /c-ICFs. Comparing the lower bounds to our results in Section 3 
and 4 will prove that our revocation schemes are up to constants optimal. 

Theorem 21 Let F be a k-ICF of U , then Tp > f/k\ ■ — k. 

Proof: We prove a lower bound for covering the interval sets [1, 1], [1,2], . . . 

, [l,n] with the fc-ICF F. This yields a lower bound for covering all interval 
sets I Q U. The bound is proven by induction. For fc = 1 an optimal family F\ 
covering these sets must contain all of the n interval sets. But each of these sets 
contains the element 1. Thus, Fp-^ = = n. The identity 

Tp^=hp^{l), (1) 



is an invariant of the proof, where denotes an optimal covering scheme with 
at most k interval sets. The inductive step is done from fc — 1 to fc. 

Assume, there’s an optimal family Fk covering the sets [1, 1], [1, 2], . . . , [1, n] 
with at most k interval sets and minimal Tp^, . We show in Lemma 24 that we 
can assume wlog Tp^ = hp^{l). Hence, invariant (1) holds. 

Now, consider the interval sets of Fk containing 1. Let these be the sets [1, oi], 
[ 1 , 02 ], ..., [IjOt], where Oi = 1 because Fk must cover the single user 1. By 
construction, t = Tp^,. An auxiliary set is defined by [l,at+i] with at+i = n + 1. 
The intervals sets [l,ai+i — 1], 1 < t < t are covered by taking the interval set 
[1, Qi] and an optimal covering in Fk of the remaining interval [oi + 1, a^+i — 1] 
with at most fc — 1 sets. 

The element 0 ^ + 1 is the first element in the interval [a^ + 1, Oj+i — 1] . Hence, 
it plays the role of the element 1 when covering the sets [oj + 1 , 0 ^ + 1], [oi + 
1, tti + 2], . . . , [oi + 1, Ui+i — 1]. By equation (1), the element Oi + 1 is critical, 
because it has maximal multiplicity of all the elements in [oi + 1, — 1]. Thus, 

element + 1 must be contained in hp^_-^{l) interval sets and the induction 
hypothesis applies with k — 1 and interval length a^+i — 0 ^ — 1. Additionally, 
Oi + 1 is contained in the t — i interval sets [1, a^+i], . . . , [1, at]. Since hp^ (1) = t 
and 1 is the element of maximal multiplicity, we obtain for 1 <i < t 



\/(fc - 1)! • (tti+i - Oi) '“-1 -{k-l) + t-i<t 

{i + k-lf-^ 



fli+l ^ + 



(fc-1)! 



( 2 ) 

(3) 



Solving the recurrence in (3) for oi = 1 yields 



Qi+l ^ <2i-l + 
1 



ii + k-2f-^ 
f-^dj = 



{i + kY 



< 



(fc-1)! 



i=o 



k\ 
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But we know at+i = n + 1, which leads to 

{t + k)^ 



kl 



> n + 1 



t> -k. 



Using Stirling’s formula k\ ~ \/2Trk (f)^ > (f)^, we conclude 

Corollary 22 



Tf>-- (n+l)^/'^-A:> 
e 




Taking k = log^ n yields 

Corollary 23 



- l) • log^n. 



□ 



Lemma 24 Let F^. he a k-ICF covering [1,\],[1,2], . . . ,[l,n] with minimal . 
Fk can he turned into a k-ICF with hp^i^) = Tf*- 

Proof: The proof is given in the full version of the paper. 0 



Theorem 25 Let F he a k-LCF of U , then Tp > (g) ^ . 

Proof: Let F={S'i, S' 2 , . . . , S'™}. Since F is a fc-ICF, for every interval set 

L G U there exist at most k interval sets Si^, Si^, ■ ■ ■ , Si^ such that S{L) := 
is a cover of /. Note, that some Si^ might be empty and there 
might be several S{I) in F that cover L. For each interval set / we consider an 
arbitrary but fixed S{L). 

Assume Tp < Fix some interval set Si = [a,b] of F and consider 

the number of times this set can be contained in a cover S{L) of some interval 
set / = [s, t], where s < a or h < t. Consider the case s < a. By assumption, user 
a — 1 is contained in at most sets of F. On the other hand, every cover 

S{L) containing Si = [a, 6] must contain a set Sj which includes the element 
0 — 1. Next consider the interval Sj U Si = [c,d\. Assuming s < c and arguing 
as above, S{L) must contain one of the intervals containing c — 1. 

Continuing in this way and using the fact that F is a fc-ICF, we conclude that 
there are at most ^ ^ covers S{I) in which a set St can participate. This 

holds for any Sc 

\{IQU-.S,&S{L)]\<\n-^. 

6 



(4) 
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Next, we count the number of elements with multiplicities in all the ( 2 ) interval 
sets I in U. Since there is one interval of length n, two intervals of length n — 1, 
etc., we get |.f| = i ■ {n — i + 1) = j Since the 

interval sets Si that cover I can overlap, the following inequality holds 

H H 1^*1 ^ H 1^1 > (5) 

leu Si&s(i) leu 



Using inequality (4), we also obtain 



E E |5.|= E 

leu SieS(i) SiGF 



1 2(fc-l) 

< —n 

“ 6 



i: is.i 

SiGF 



Combining (5) and (6) leads to 



^ |5,|>n3-^=ni+i 

SiGF 



(6) 



Note that g_f I’Sil — J2ueu hpiu). Taking the average number of the mul- 
tiplicities hp{u) yields that there must be an element u with hpiu) > n^, 
contradicting the assumption that each element is in at most sets. □ 



Definition 26 We call a k-ICF F 5-optimal, if for all k-ICFs F: ^ = 0(<5). 

Theorem 27 RSI uses a min^ ,kTnf/^}~ optimal k-ICF. The {k + 1)-ICF 

F' constructed for RS2 is min^jn^/^, /c}-optzmaZ. 

Proof: RSI uses a (2fc — 1)-ICF F with Tp = 0{kn^^^). This can be turned into 
a fc-ICF with Tp = 0{kn'^^^). Dividing by the lower bounds of Corollary 22 and 
Theorem 25 gives the fcn^/^}-optimality. RS2 uses a (fc -I- 1)-ICF F' 

with Tpi = 0{kn^/^). Applying Corollary 22 and Theorem 25 proves the claim. □ 



Corollary 28 For k = logj,n, the {2k— 1)-ICF F used in RSI and the {k + 1)- 
ICF F' used in RS2 are 1-optimal. 

Note, that we obtain the lower bound in Theorem 21 by covering the intervals 
[1,1], [1,2], ..., [l,n]. This is only a small subset of all the intervals in [l,n] 
and the left border is fixed by the element 1. It seems that making both borders 
variable introduces a factor of but we can not prove this yet. Thus, we 

expect a lower bound of Tp = for any fc-ICF F. This would yield 

1-optimality for the ICF F' in RS2 independent of the choice of k. 
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7 Conclusion 

We introduced a new class called ICF for key revocation. Micali’s scheme [5,6] 
and ALO’s Hierarchical scheme [1] belong to this class. We improved upon the 
former results by reducing the critical update cost for CA-to-directory commu- 
nication. In practice, the performances of our revocation schemes depend on the 
expected number r of revoked users. If one expects r to be a small fraction of n, 
then RS2 is preferable. It avoids a factor of 2 in the communication. RSI should 
perform better for large r. We have shown the first lower bounds in this area, 
proving our schemes to be optimal up to constants. 
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Abstract. Let n be a large composite number. Without factoring n, 
the computation of (mod n) given a, t with gcd(a, n) = 1 and t < n 
can be done in t squarings modulo n. For t n (e.g., n > 2^^^* and 
t < no lower complexity than t squarings is known to fulfill this 

task. Rivest et al suggested to use such constructions as good candidates 
for realising timed-release crypto problems. 

We argue the necessity for a zero-knowledge proof of the correctness of 
such constructions and propose the first practically efficient protocol for 
a realisation. Our protocol proves, in logj t standard crypto operations, 
the correctness of (a®)^ (mod n) with respect to a® where e is an RSA 
encryption exponent. With such a proof, a Timed-release Encryption of 
a message M can be given as a? M (mod n) with the assertion that the 
correct decryption of the RSA ciphertext M® (mod n) can be obtained 
by performing t squarings modulo n starting from a. Timed-release RSA 
signatures can be constructed analogously. 

Keywords Timed-release cryptography. Time-lock puzzles, Non-paral- 
lelisability, Efficient zero-knowledge protocols. 



1 Introduction 

Let n be a large composite natural number. Given t < n and gcd(a,n) = 1, 
without factoring n, the validation of 

X = (j? (mod n) (1) 

can be done in t squarings mod n. However if (f>(n) (Euler’s phi function of n) 
is known, then the job can be completed in O(logn) multiplications via the 
following two steps: 

u 2* (mod (f>{n)), (2) 

X a“ (mod n). (3) 

For t <^n (e.g., n > 2^°^"^ and t < 2^°°), it can be anticipated that factoring 
of n (and hence computing (f)(n) for performing the above steps) will be much 
more difficult than performing t squarings. Under this condition we do not know 
any other method which, without using the factorisation of n, can compute 
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(mod n) in time less than t squarings. Moreover, because each squaring can 
only be performed on the result of the previous squaring, it is not known how to 
speedup the t squarings via parallelisation of multiple processors. Parallelisation 
of each squaring step cannot achieve a great deal of speedup since a squaring 
step only needs a trivial computational resource and so any non-trivial scale 
of parallelisation of a squaring step is likely to be penalised by communication 
delays among the processors. 

These properties suggest that the following language (notice that each ele- 
ment in the language associates a non-secret natural number t) 

L{a, n) = {a^ (mod n) | gcd(a, n) = 1, t = 1, 2, ..., } (4) 

forms a good candidate for the realisation of timed-release crypto problems. 
Rivest, Shamir and Wagner pioneered the use of this language in a time- lock 
puzzle scheme [11]. In their scheme a puzzle is a triple (t, a, n) and the instruction 
for finding its solution is to perform t squarings mod n starting from a which 
leads to (mod n). A puzzle maker, with the factorisation knowledge of n, 
can construct a puzzle efficiently using the steps in (2) and (3) and can fine tune 
the difficulty for finding the solution by choosing t in a vast range. For instance, 
the MIT Laboratory for Computer Science (LCS) has implemented the time- 
lock puzzle of Rivest et al into “The LCS35 Time Capsule Crypto-Puzzle” and 
started its solving routine on 4th April 1999. It is estimated that the solution to 
the LCS35 Time Capsule Crypto-Puzzle will be found in 35 years from 1999, or 
on the 70 years from the inception of the MIT-LCS [10]. (Though we will discuss 
a problem of this puzzle in §1.2.) 

1.1 Applications 

Boneh and Naor used a sub-language of L(a,n) (details to be discussed in §1.2) 
and constructed a timed-release crypto primitive which they called “timed com- 
mitments” [3]. Besides several suggested applications they suggested an inter- 
esting use of their primitive for solving a long-standing problem in fair contract 
signing. A previous solution (due to Damgard [6]) for fair contract signing be- 
tween two remote and mutually distrusted parties is to let them exchange signa- 
tures of a contract via gradual release of a secret. A major drawback with that 
solution is that it only provides a weak fairness. Let us describe this weakness 
by using, for example, a discrete-logarithm based signature scheme. A signature 
being gradually released relates to a series of discrete logarithm problems with 
the discrete logarithm values having gradually decreasing magnitudes. Sooner 
or later before the two parties completes their exchange, one of them may find 
himself in a position of extracting a discrete logarithm which is sufficiently small 
with respect to his computational resource. It is well-known (e.g., the work of 
van Oorschot and Wiener on the parallelised rho method [13]) that parallelisa- 
tion is effective for extracting small discrete logarithms. So the resourceful party 
(one who is able to afford vast parallelisation) can abort the exchange at that 
point and wins an advanced position unfairly. Boneh and Naor suggested to seal 
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signatures under exchange using elements in L(a,n). Recall the aforementioned 
non-parallelisable property for re-constructing the elements in L{a,n), a roughly 
equal time can be imposed for both parties to open the sealed signatures regard- 
less of their difference (maybe vast) in computing resources. In this way, they 
argued that strong fairness for contract signing can be achieved. 

Rivest et al suggested several other applications of timed-release cryptogra- 
phy [11]: 

— A bidder in an auction wants to seal his bid so that it can only be opened 
after the bidding period is closed. 

— A homeowner wants to give his mortgage holder a series of encrypted mort- 
gage payments. These might be encrypted digital cash with different decryp- 
tion dates, so that one payment becomes decryptable (and thus usable by 
the bank) at the beginning of each successive month. 

— A key-escrow scheme can be based on timed-release crypto, so that the gov- 
ernment can get the message keys, but only after a fixed, pre-determined 
period. 

— An individual wants to encrypt his diaries so that they are only decryptable 
after fifty years (when the individual may have forgotten the decryption key). 

1.2 Previous Work and Unsolved Problem 

With the nice properties of T(a, n) we are only half way to the realisation of 
timed-release cryptography. In most imaginable applications where timed-release 
crypto may play a role, it is necessary for a problem constructor to prove (ideally 
in zero-knowledge) the correct construction of the problem. For example, without 
a correctness proof, the strong fairness property of the fair-exchange application 
is actually absent. 

From the problem’s membership in NP we know that there exists a zero- 
knowledge proof for a membership assertion regarding language L(a,n). Such a 
proof can be constructed via a general method (e.g., the work of Goldreich et al 
[9]). However, the performance of a zero-knowledge proof in a general construc- 
tion is not suitable for practical use. By the performance for practical use we 
mean an efficiency measured by a small polynomial in some typical parameters 
(e.g., the bit length of n). To our knowledge, there exists no practically effi- 
cient zero-knowledge protocols for proving the general case of the membership 
in T(a, n). 

Boneh and Naor constructed a practically efficient protocol for proving mem- 
bership in a sub-language of L{a, n) where t = 2^ with k being any natural 
number. The time control that the elements in this sub-language can offer has 
the granularity 2. We know that the time complexity in bit operation for per- 
forming one squaring modulo n can be expressed by the lowest known result of 
c-logndoglogn (where c > 1 is a machine dependent value, a faster machine has 
a smaller c) if FFT (fast Fourier transform) is used for the implementation of 
squaring. Thus, the time complexity for computing elements in this sub-language 
is the step function 



2^ • c • log n ■ log log n 
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which has a fast increasing step when k gets large. Boneh and Naor envisioned 
fee [30, 50] for typical cases in applications. While it is evident that k decreas- 
ing from 30 downwards will quickly trivialise a timed-release crypto problem as 

230 

is already at the level of a small polynomial in the secure bit length of n 
(usually 2^°), a k increasing from 30 upwards will harden the problem in such 
increasingly giant steps that imaginable services (e.g., the strong fairness for 
gradual disclosure of secret proposed in [3]) will quickly become unattractive or 
unusable. Taking the LCS35 Time Capsule for example, let the 35-year-opening- 
time capsule be in that sub-language (so the correctness can be efficiently proved 
with the protocol in [3]), then the only other elements in that sub-language with 
opening times close to 35 years will be 17.5 years and 70 years. We should notice 
that there is no hope to try to tune the size of n as a means of tuning the time 
complexity since changing c • log n • log log n will have little impact on the above 
giant step function. 

Boneh and Naor expressed a desire for a finer time-control ratio than 2 and 
sketched a method to obtain a finer ratio with = 1 and ti = C_i -I- U -2 for 
i = 1, ...,k. This method of reducing the ratio renders the ratio being bounded 
below by a = (~ 1.618) while increasing the number of proof rounds from 

k to log^ k. They further mentioned that smaller values can be obtained by other 
such recurrences. It seems to us that if some recurrence method similar to above 
is used, then with ratio ^ 1 (1 is the ideal ratio and will be that for our case), the 
number of proof rounds logj-^tio k ^ oo. So their suggested methods for reducing 
the time-control ratio are not practical for obtaining a desirable ratio. 

The Time-Lock-Puzzle work of Rivest et al [11] did not provide a method for 
proving the correct construction of a timed-release crypto problem. 

1.3 Our Work 

We construct the first practically efficient zero-knowledge proof protocol for 
demonstrating the membership in L(a,n) which runs in log 2 t steps, each an 
exponentiation modulo n, or 0 (log 2 t ■ (log 2 n)^) bit operations in total (without 
using FFT). This efficiency suits practical uses. The membership demonstration 
can be conducted in terms of (a®)^ (mod n) € L{a^,n) on given a, a® and t, 
where e is an RSA encryption exponent. Then we are able to provide two timed- 
release crypto primitives, one for timed release of a message, and the other for 
timed release of an RSA signature. In the former, a message M can be sealed 
in a? M (mod n), and the established membership asserts that the correct de- 
cryption of the RSA ciphertext M® (mod n) can be obtained by performing t 
squarings modulo n starting from a. The latter primitive can be constructed 
analogously. 

Our schemes provide general methods for the use of timed-release cryptog- 
raphy. 
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1.4 Organisation 

In the next section we agree on the notation to be used in the paper. In Section 3 
we construct general methods for timed-release cryptography based on proven 
membership in L{a, n). In Section 4 we construct our membership proof protocol 
working with an RSA modulus of a safe-prime structure. In Section 5 we will 
discuss how to generalise our result to working with a general form of composite 
modulus. 



2 Notation 

Throughout the paper we use the following notation. Z„ denotes the ring of in- 
tegers modulo n. Z* denotes the multiplicative group of integers modulo n. 4>{n) 
denotes Euler’s phi function of n, which is the order, i.e., the number of elements, 
of the group Z* . For an element a G Z* , ord„(a) denotes the multiplicative order 
modulo n of a, which is the least index i satisfying a* = 1 (mod n); (a) denotes 
the subgroup generated by a; (^) denotes the Jacobi symbol of x mod n. We 
denote by J+(n) the subset of Z* containing the elements of the positive Jacobi 
symbol. For integers a, b, we denote by gcd(a, b) the greatest common divisor 
of a and b. For a real number r, we denote by [rj the floor of r, i.e., r rounded 
down to the nearest integer. 

3 Timed-Release Crypto 

with Proven Membership in L(a,n) 

Let Alice be the constructor of a timed-release crypto problem. She begins with 
constructing a composite natural number n = pq where p and q are two distinct 
odd prime numbers. Define 



a{t) (mod n). 


(5) 


a®(t) (mod n), 


(6) 



where e is a fixed natural number relatively prime to </>(n) (in the position of an 
RSA public exponent), and a ^ ±1 (mod n) is a random element in Z*. Alice 
can construct a{t) using the steps in (2) and (3). 

The following security requirements should be in place: n should be so con- 
structed that ord,^(„)(2) is sufficiently large, and a should be so chosen that 
ord„(a) is sufficiently large. Here, “sufficiently large” means “much larger than 
t” for the largest possible t that the system should accommodate. 

In the remainder of this section, we assume that Alice has proven to Bob, 
the verifier, the following membership status (using the protocol in §4): 



a®(t) G L(a®, n). 



(7) 
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Clearly, with e co-prime to <p{n), this is equivalent to another membership status: 

a{t) G L{a, n). 

However in the latter case a{t) is (temporarily) unavailable to Bob due to the 
difficulty of extracting the e-th root (of a®(t)) in the RSA group. 



3.1 Timed-Release of an Encrypted Message 

For message M < n, to make it decryptable in time t, Alice can construct a 
“timed encryption”: 

TE{M,t) a{t)M (mod n). (8) 

Let Bob be given the tuple {TE{M,t),a^{t),e,a,t,n) where a®(t) is con- 
structed in (5) and (6) and has the membership status in (7) proven by Alice. 
Then from the relation 



TE{M, ty = ayt)M^ (mod n), (9) 

Bob is assured that the plaintext corresponding to the RSA ciphertext 
M® (mod n) can be obtained from TE{M,t) by performing t squarings mod- 
ulo n starting from a. We should note that in this encryption scheme, Alice is 
the sender and Bob, the recipient; so if Alice wants the message to be timed- 
release to Bob exclusively then she should send a to Bob exclusively, e.g., via a 
confidential channel. 

Remark 1. As in the case of any practical public-key encryption scheme, M in (8) 
should be randomised using a proper plaintext randomisation scheme designed 
for providing the semantic security (e.g., the OAEP scheme for RSA [7]). 



3.2 Timed-Release of an RSA Signature 

Let e, n be as above and d satisfy ed=l (mod (j){n)) (so d is in the position of 
an RSA signing exponent). For message M < n (see Remark 2 below), to make 
its RSA signature M‘^ (mod n) releasable in time t, Alice can construct a “timed 
signature” : 

TS{M,t) = a{t)M'^ (mod n). (10) 

Let Bob be given the tuple {M,TS{M,t),a^{t),e,a,t,n) where a®(t) is con- 
structed in (5) and (6) and has the membership status in (7) proven by Alice. 
Then from the relation 



TS{M,ty = ayt)M (mod n). 



( 11 ) 



Bob is assured that the RSA signature on M can be obtained from TS{M, t) by 
performing t squarings modulo n starting from a. 
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Remark 2. As in the case of a practical digital signature scheme, in order to 
prevent existential forgery of a signature, M in (10) should denote an output 
from a cryptographically secure one-way hash function. If we further require the 
signature to an indistinguishability property (see §3.3), then the hashed result 
should be in J+(n). Padding M with a random string and then hashing, the 
probability for the hashed result in J+{n) is 0.5. 

3.3 Security Analysis 

Confidentiality of M in TE{M,t) We assume that Alice has implemented 
properly our security requirements on the large magnitudes of ord^(„)(2) and 
ord„(a). Then we observe that L{a, n) is a large subset of the quadratic residues 
modulo n, and the mapping a a{t) is one-way under the appropriate in- 
tractability assumption (here, integer factorisation). Consequently, our scheme 
for encrypting M G Z* in TE{M,t) is a trapdoor one-way permutation since it 
is the multiplication, modulo n, of the message M to the trapdoor secret a{t). 
In fact, from (9) we see that the availability of TE{M,tY and a®(t) makes M® 
available, and so without considering to go through t squarings, the underly- 
ing intractability of TE{M,t) is reduced to that of RSA. Therefore, well-known 
plaintext randomisation schemes for RSA encryption (e.g., OAEP [7]), which 
have been proposed for achieving the semantic security (against adaptive chosen 
ciphertext attacks) can be applied to our plaintext message before the appli- 
cation of the permutation. The message confidentiality properties (i.e., the in- 
distinguishability and non-malleability on the message M) of our timed-release 
encryption scheme should follow directly those of RSA-OAEP. 

Thus, given the difficulty of extracting the e-th root of a random element 
modulo n, a successful extraction of a{t) from a^{t), or of some information 
regarding M from TE(M,t), will constitute a grand breakthrough in the area if 
they are done at a cost less than t squarings modulo n. 

Unforgeability of in TS{M,t) First, recall that M here denotes an 
output from a cryptographically secure one-way hash function before signing in 
the RSA fashion. The unforgeability of M'^ in TS{M,t) follows directly that of 
M'^ (mod n) given in clear. 

Secondly, the randomness of a®(t) ensures that of TS{M,ty. Thus the avail- 
ability of the pair (TS{M,t), TS{M,tY) does not constitute a valid signature of 
Alice on anything (such as on an adaptively chosen message). The availability 
of the pair (TS{M,t), TS{M,tY) is equivalent to that of {x,x^ which can be 
constructed by anybody using a random x. 



Indistinguishability of in TS{M,t) The indistinguishability is the fol- 
lowing property: with the timed-release signature TS{M,t) on M and with the 
proven membership a®(t) S but without going through t squarings mod 

n, one should not be able to tell whether TS{M,t) has any verifiable relation- 
ship with a signature on M . This property should hold even if the signature 
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pair (M, M‘^) becomes available; namely, even if Bob has recovered the signa- 
ture pair (e.g., after having performed t squarings), he is still not able 

to convince a third party that TS{M,t) is a timed-release signature of Alice on 
M . This property is shown below. 

Let M e J+(n) be any message of Bob’s choice (e.g.. Bob may have chosen 
it because may be available to him from a different context). We have 

/ ]cr\ ^ 

TS(M, t) = a{t)M‘^ = a{t) f ^ j = ~aM<^ (mod n) . 

So upon seeing Bob’s allegation on a “verifiable relationship” between TS{M,t) 
and M‘^, the third party faces a problem of deciding which of M‘^ or M‘^ is sealed 
in TS{M, t). This boils down to deciding if a{t) € L(a, n) or if a € L(a, n) (both 
are in J_|_(n)), which is still a problem of going through t squarings. Thus, even 
though the availability of Af’* and M‘^ does allow one to recognise that the both 
are in fact Alice’s valid signatures, without verifying the membership status, one 
is unable to tell if any of the two has any connection with TS{M, t) at all. 



4 Membership Proof with Modulus 
of a Safe-Prime Structure 

Let Alice have constructed her RSA modulus n with a safe-prime structure. 
This requires n = pq, p' = {p — l)/2, q' = {q — l)/2 where p, q, p' and q' are all 
distinct primes of roughly equal size. We assume that Alice has proven to Bob 
in zero-knowledge such a structure of n. This can be achieved via using, e.g., the 
protocol of Camenisch and Michels [4].^ 

Let a G Z* satisfy 

gcd(a± l,n) = 1, (12) 

It is elementary to show that a satisfying (12) and (13) has the full order 2p'q' . 
The following lemma observes a property of a. 

Lemma 1. Let n he an RSA modulus of a safe-prime structure and a G Z* of 
the full order. Then for any x G Z*, either x G (a) or —x G (a). 

Proof It’s easy to check —1 ^ (a). So (a) and the coset (— l)(a) both have the 
half the size of Z*, yielding Z* = (a) U (— l)(a). Any a; G Z* is either in (a) or 
in (— l)(a). The latter case means —x G (a). □ 

^ Due to the current difficulty of zero-knowledge proof for a safe-prime-structured RSA 
modulus, we recommend to use the method in Section 5 which works with a general 
form of composite modulus. The role of Section 4 is to serve a clear exposition on 
how we solve the current problem in timed-release cryptography. 
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4.1 A Building Block Protocol 

Let Alice and Bob have agreed on n (this is based on Bob’s satisfaction on Alice’s 
proof that n has a safe-prime structure) . 

Figure 1 specifies a perfect zero-knowledge protocol (SQ) for Alice to prove 
that for a,x,y G Z* with n of a safe-prime structure, a of the full order, and 
x,y G J+{n), they satisfy (note, ± below means either -|- or — , but not both) 

2 

3z : X = ±a^ (mod n), y = ±a^ (mod n). (14) 

Alice should of course have constructed a,x,y to satisfy (14). She sends a,x,y 
to Bob. 

Bob (has checked n of a safe-prime structure) should first check (12) and 
(13) on a for its full-order property (the check guarantees a ^ ±1 (mod n)); he 
should also check x,y G J+(n). 



SQ(a,x,y,n) 

Input Common: n: an RSA modulus with a safe-prime structure; 

a G an element of the full-order 2p'q' = <^(n)/2 

(so a ^ ±1 (mod n)); 

x,y G J+{n): x ^ ±y (mod n); 

Alice: z: X = ±o^ (mod n), y = ±a^ (mod n); 

1. Bob chooses at random r < n, s < n and sends to Alice: C a^x“ (mod n); 

Hcf 

2. Alice sends to Bob: R = (mod n); 

3. Bob accepts if i? = ^ix^y^ (mod n), or rejects otherwise. 

Fig. 1. Building Block Protocol 



Remark 3. For ease of exposition this protocol appears in a non zero-knowledge 
format. However, the zero-knowledge property can be added to it using the 
notion of a commitment function: Instead of Alice sending R in Step 2, she 
sends a commitment commit{R), after which Bob reveals r and s; this allows 
Alice to check the correct formation of (7; the correct formation means that Bob 
has already known Alice’s response. 



Theorem 1. Let a,x,y,n he as specified in the common input in Protocol SQ. 
The protocol has the following properties: 

Completeness If the common input satisfies (14) then Bob will always accept 
Alice ’s proof; 
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Soundness If (14) does not hold for the common input, then Alice, even com- 
putationally unbounded, cannot convince Bob to accept her proof with probability 
greater than 2 p’q~^ ^ 

Zero-knowledge Bob gains no information about Alice’s private input. 

Proof 

Completeness Evident from inspection of the protocol. 

Soundness Suppose that (14) does not hold for the common input (a,x,y,n) 
(here x,y G J+{n)) whereas Bob has accepted Alice’s proof. By Lemma 1, the 
first congruence of (14) always holds for some z = log^j ±x. So it is the second 
congruence of (14) that does not hold for the same z. Let ^ € Z* satisfy 

y = (mod n) with f yf ±1. (15) 

Since Bob accepts the proof, he sees the following two congruences 

C = (mod n), (16) 



R = ±x''y’’ (mod n). 



(17) 



Since (16) implies 

(mod n), 

and by Lemma 1, both log^j and log^jX^ (= log^(±x)^ = 2z) exist, we can 
write the following linear congruence with r and s as unknowns 



log„ = 2r -h 2zs (mod 2p'(f). 



For s = 1, 2, • • • , 2p'q' , this linear congruence yields r = (mod 2p'q'). 

Therefore there exists exactly 2p'q' pairs of (r, s) to satisfy (16) for any fixed C 
(and the fixed a, x). Each of these pairs and the fixed x, y will yield an R from 
(17). Below we argue that for any two such pairs, denoted by (r, s) and (r', s'), 
if gcd(s — s', 2p'q') < 2 then they must yield R ^ ±i?' (mod n). Suppose on the 
contrary for 

aJ' x" = C = al x’^ (mod n), i.e., a’’”’’ = x® (mod n), (18) 



it also holds 

= R = ±R' = ix’’ y® (mod n), i.e.. 



= ±y® ® (mod n). (19) 



Using the second congruence in (18), noticing x = ±a” and (15), we can trans- 
form the second congruence in (19) to 

(±l)h-r-'-Hz(s'-.)]^[z=(.'-s)] ^ ^r-r' ^ ^ys'-s ^ - s) - s)] ^ 



^ The safe-prime structure of n implies p' ^ q' ^ ^/n and hence this probability value 
is approximately 2j,/n. 
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which yields 

± = (±l)[r-r'+z(.'-.)] ^ j ^ ^ (20) 

Recall that ^ ^ ±1 and y = (mod n) with x,y G J+(n), we know 

ord„(^) 7 ^ 2 (i.e., ^ cannot be any square root of 1, since the two roots yf ±1 
will render y ^ J+(n)). Thus, ord„(^) must be a multiple of p' or q' or both. 
However, we have assumed gcd(s' — s, 2p'q') < 2, i.e., gcd(2(s' — s), 2p'<;') = 2, 
so 2 (s' — s) cannot be such a multiple. Consequently (20) cannot hold and we 
reach a contradiction. 

For any s < 2p'g', it’s routine to check that there are 2p' + 2q' — 2 cases of 
s' satisfying gcd(2(s' — s), 2p'q') > 2. Thus, if (14) does not hold, amongst 2p'q' 
possible R's matching the challenge C, there are in total 2p' + 2q' — 1 of them 
(matching the s itself and the 2p' + 2q' — 2 other s's) that may collide to Bob’s 
fixing of R. Even computationally unbounded, Alice will have at best 
probability to have responded with a correct R. 

Zero-Knowledge Immediate (see Remark 3). □ 



4.2 Proof of Membership in L{a, n) 

For t > 1, we can express 2* as 



J 2P'd/2)] = \^(t/2)Y if I ig evejj 

I 2 [ 2 -(t-l)/ 2 +l] = [ 2 (*- 1 )/ 2 ] 2 .2 if f is odd 



Copying this expression to the exponent position of (mod n), we can express 

if Ms even - - 



(mod n) = 



(^[2d 1)/2]2(|2 if^igo(fd 



In (21) we see that the exponent 2* can be expressed as the square of another 
power of 2 with t being halved in the latter. This observation suggests that re- 
peatedly using SQ, we can demonstrate, in [log 2 t\ steps, that the discrete log- 
arithm of an element is of the form 2* . This observation translates precisely into 
the protocol specified in Figure 2 which will terminate within [log 2 t\ steps and 
prove the correct structure of a{t). The protocol is presented in three columns: 
the actions in the left column are performed by Alice, those in the right column, 
by Bob, and those in the middle, by the both parties. 

A run of Membership{a,t,a{t),n) will terminate within [log 2 tJ loops, and 
this is the completeness property. The zero-knowledge property follows that of 
SQ (also note Remark 4(ii) below). We only have to show the soundness property. 



Theorem 2. Let n = (2p' -|- l){2q' + 1) he an RSA modulus of a safe-prime 
structure, a € 7,’^ he of the full order 2p'q', and t > 1. Upon acceptance ter- 
mination of Member ship{a,t,a{t),n), relation aft) = ±a^ (mod n) holds with 
probability greater than 

^ _ [log 2 (V + 2q' - 1) 

2p'q' 
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M ember ship{a, t, a{t),n) 

Abort and reject if any checking by Bob fails, or accept upon termination. 



Alice both 

u a(t); 

While t > 1 do 

/ def 

' y = u\ 

if t is odd: y a{t — 1); 

X =*' a([t/2j); 

Sends x, y to Bob; 



V 



SQ{a,x,y,n)-, 

def 

u = x; 



Bob 



? ■? 

u G J+{n); a ^ ±ti (mod n) 



Receives x, y from Alice; 

7 

x,y e J+(n); 

if t is odd: y^ = u (mod n); 

7 

if t is even: y = u (mod n); 



When t = 1: 
u = a? (mod n); 



Fig. 2. Membership Proof Protocol 



Proof Denote by SQ{a,Xi,yi,n) and by SQ{a, X 2 ,y 2 ,n) any two consecutive 
acceptance calls of SQ in Membership (so in the first call, yi = a{t) if t is even, 
or yi = a{t — 1) if t is odd; and in the last call, X 2 = a^). When t > 1, such two 
calls prove that there exists z: 

2 

X 2 = ±a^ (mod n), y 2 = (mod n), (22) 

and either 

x\ = y 2 = (mod n), y\ = ±a^ (mod n), (23) 

or 

x\= y^= (mod n), y\ = (mod n). (24) 

Upon t = 1, Bob further sees that X 2 = a^- By induction, the exponents z (resp. 
z^, z^, 2z^, 4z^) in all cases of ±a^ (resp. ±a^^, • • •) in (22), (23) or (24) contain 
a single factor: 2. So we can write a{t) = ±a^ (mod n) for some natural number 
u. 

Further note that each call of SQ causes an effect of having 2“ square-rooted 
in the integers which is equivalent to having u halved in the integers. Thus, 
exactly [log 2 uj calls (and no more) of SQ can be made. But Bob has counted 
[log 2 t\ calls of SQ, therefore u = t. 

Each acceptance call of SQ has the correctness probability of 1 — 2 p''q~^ ■ 

So after [log 2 t\ acceptance calls of SQ, the probability for Membership to be 
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correct is 



2p' + 2g' - 1 1 ^ I ^ [log 2 tj (2y + 2q' - 1) ^ 

^ 2p'q' ’ 2pfq' 

Remark 4- 

i) An acceptance run of M ember ship{a,t,a{t),n) proves ±a(t) € L{a,n), or 
a^(t) = a{t+ 1) G L(a,n). 

ii) It is obvious that by preparing all the intermediate values in advance, Pro- 
tocol Membership can be run in parallel to save the [log 2 tJ rounds of in- 
teractions. This way of parallelisation should not be confused with another 
common method for parallelising a proof of knowledge protocol using a hash 
function to create challenge bits (which turns the proof publicly verifiable). 
Our parallelisation does not damage the zero-knowledge property. 

iii) In most applications, a{t) is the very number (solution to a puzzle) that 
should not be disclosed to Bob during the proof time. In such a situation, 
Alice should choose t to be even and render a{t — 1) to be the solution to a 
puzzle. Then a proof of M ember ship{a,t,a{t),n) will not disclose a{t — 1). 
Note that such a proof does disclose to Bob a{\t/2\) which provides Bob 
with a complexity of \t/2\ — 1 squarings to reach a{t — 1). To compensate 
the loss of computation, proof of M ember ship{a,2t,a{2t),n) is necessary. 
Consequently, the proof runs one loop more than M ember ship{a,t,a{t),n) 
does. Note that the above precautions are unnecessary for our applications 
in §3 where it is the e-th root of a®(t) that is the puzzle’s solution; the 
disclosures of a®(t) or a®([t/2j) do not seem to reduce the time complexity 
for finding a{t). 

4.3 Performance 

In each run of SQ, Alice (resp. Bob) performs one (resp. four) exponentiation(s) 
mod n. So in M ember ship{a,t,a{t),n) Alice (resp. Bob) will perform [log 2 tJ 
(resp. 4[log2tJ) exponentiations mod n. These translate to 0([log2 tJ (log 2 n)^) 
bit operations. 

In the LCS35 Time Capsule Crypto-Puzzle [10], t = 79685186856218 is a 
47-bit binary number. Thus the verification for that puzzle can be completed 
within 4 X 47 = 188 exponentiations mod n. 

The number of bits to be exchanged is measured by 0(([log2 tJ)(log 2 n)). 

5 Use of Modulus of a General Form 

When n does not have a safe-prime structure, the error probability of SQ can 
be much larger than what we have measured in Theorem 1. The general method 
for Alice to introduce an error in her proof (i.e., to cheat) is to fix y in (15) 
with some ^ yf ±1. For y so fixed before Bob’s challenge C, Bob is actually 
awaiting for R = (mod n) in which U (mod n) is the only value that 
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Alice does not know (she does not know it because Bob’s random choice of s 
is perfectly hidden in C). Therefore in order to respond with the correct R, it 
is both necessary and sufficient for Alice to guess (mod n) correctly. Notice 
that while it is unnecessary and can be too difficult for Alice to guess s, guessing 

(mod n) need not be very difficult and the probability of a correct guess is 
bounded by . Thus, in order for Alice to achieve a large error probability 

(meaning, to ease her cheating), she should use ^ of a small order. 

The above cheating scenario provides the easiest method for Alice to cheat 
and yet is general enough for covering the cases that the soundness of SQ should 
consider. Multiplying both x and y with some small-order elements will only 
make the cheating job more difficult. Therefore it suffices for us to anticipate 
the above general cheating method. 

To this end it becomes apparent that in order to limit Alice’s cheating proba- 
bility we should prevent her from constructing y in (15) using ^ of a small order. 
Using a safe-prime-structured modulus n = (2p' + l)(2q' + 1) achieves this pur- 
pose exactly because then the least order available to Alice is min(^, ^) which 
is satisfiably small (using ^ of order 2 either does not constitute an attack, or 
will cause detection of y ^ J+{n)). 

While a zero-knowledge proof of n being in a safe-prime structure is com- 
putationally inefficient to date, it is rather easy to construct a zero-knowledge 
proof protocol for proving that (f>(n) is free of small odd prime factors up to a 
bound B. Boyar et al [2] constructed a practically efficient zero- knowledge proof 
protocol for proving that 4>(n) is relatively prime to n. As in [8], we can apply 
the same idea to prove that (f>{n) is relatively prime to A (i.e., using A in place 
of n) where 

A = n (25) 

primes i : 

2 < i < B 

Supposing that n is a Blum integer (which can be efficiently proved using, 
e.g., the protocol of van de Graaf and Peralta [12]), then after applying the 
protocol of Boyar et al using A in (25) in place of n, we can be sure that 
the error probability of SQ is bounded by B~^ . Notice that the multiplication 
attack using the square roots of 1 with the negative Jacobi symbol (in place of ^ 
in (15)) is not possible since that will be detected by the Jacobi symbol checking 
conducted on the input values. Thus, if Alice is required to repeat running SQ 
times, then Bob is sure that her cheating probability (i.e., for (14) not to 
hold) is bounded by 2“^. 



5.1 Performance of Membership Proof 
Using General Form of Modulus 

With the soundness probability of SQ bounded by for each case of x,y, 
SQ{a,x,y,n) need to be run times to achieve an acceptable soundness 

probability 2“*. Thus in Membership, SQ is run ^ times. Since in each 
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run of SQ, Alice (resp. Bob) performs one (resp. four) exponentiation(s) mod 
n. So in M ember ship{a,t,a{t),n) Alice (resp. Bob) will perform (resp. 

^ exponentiations mod n. Adding to this is the cost for running k times 
the protocol of Boyar et al, each run of that protocol costs one modulo exponen- 
tiation for both parties. Thus, the total cost in number of exponentiations mod 
n of the membership proof for Alice is 

[logatjfc ^ 



and that for Bob is 

4[log2^Jfe , , 
log2 B 

In the LCS35 Time Capsule Crypto-Puzzle [10] where [log 2 t\ = 47, if we con- 
sider B = and k = 100, then the quantity for Alice is 570 and that for Bob is 
1980. Therefore, the LCS35 Time Capsule Crypto-Puzzle using a general-form 
modulus (Blum integer) can be verified with 1980 modulo exponentiations. 

Zero-knowledge proof of a Blum integer using the protocol in [12] has a 
performance similar to one modulo exponentiation for Alice; the workload of that 
protocol for Bob is trivial since it only involves multiplications and evaluations of 
Jacobi symbols. Thus, considering the same low soundness probability of 2“^°°, 
we should add 100 modulo exponentiations to Alice’s workload to reach 670 
modulo exponentiations. 



6 Conclusion 

We have constructed an efficient zero-knowledge protocol for providing general 
solutions to timed-release cryptographic problems (encryption and signature). 
These schemes have proven correctness on time control which can be fine tuned 
to the granularity in number of multiplications. 

Successful timed-release cryptographic problems have been constructed upon 
the integer-factoring based intractability. An important feature that such in- 
tractability offers is non-parallelisability. An open question is that can other 
intractability offer this feature? (We know that the problem of extraction of dis- 
crete logarithm can be parallelised [13].) 
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